FINAL DRAFT — 12 July 2007
The goal of this specification is to act as a baseline for all following LOLCODE specifications. As such, some traditionally expected language features may appear "incomplete." This is most likely deliberate, as it will be easier to add to the language than to change and introduce further incompatibilities.
Spaces are used to demarcate tokens in the language, although some keyword constructs may include spaces.
Multiple spaces and tabs are treated as single spaces and are otherwise irrelevant.
Indentation is irrelevant.
A command starts at the beginning of a line and a newline indicates the end of a command, except in special cases.
A newline will be Carriage Return (/13), a Line Feed (/10) or both (/13/10) depending on the implementing system. This is only in regards to LOLCODE code itself, and does not indicate how these should be treated in strings or files during execution.
Multiple commands can be put on a single line if they are separated by a comma (,). In this case, the comma acts as a virtual newline or a soft-command-break.
Multiple lines can be combined into a single command by including three periods (...) or the unicode ellipsis character (u2026) at the end of the line. This causes the contents of the next line to be evaluated as if it were on the same line.
Lines with line continuation can be strung together, many in a row, to allow a single command to stretch over more than one or two lines. As long as each line is ended with three periods, the next line is included, until a line without three periods is reached, at which point, the entire command may be processed.
A line with line continuation may not be followed by an empty line.
Three periods may be by themselves on a single line, in which case, the empty line is "included" in the command (doing nothing), and the next line is included as well.
A single-line comment is always terminated by a newline. Line continuation (...) and soft-command-breaks (,) after the comment (BTW
) are ignored.
Line continuation and soft-command-breaks are ignored inside quoted strings. An unterminated string literal (no closing quote) will cause an error.
(from 1.1)
Single line comments are begun by BTW
, and may occur either after a line of code, on a separate line, or following a line of code following a line separator (,).
All of these are valid single line comments:
I HAS A VAR ITZ 12 BTW VAR = 12
I HAS A VAR ITZ 12, BTW VAR = 12
I HAS A VAR ITZ 12
BTW VAR = 12
Multi-line comments are begun by OBTW
and ended with TLDR
, and should be started on their own lines, or following a line of code after a line separator.
These are valid multi-line comments:
I HAS A VAR ITZ 12
OBTW this is a long comment block
see, i have more comments here
and here
TLDR
I HAS A FISH ITZ BOB
I HAS A VAR ITZ 12, OBTW this is a long comment block
see, i have more comments here
and here
TLDR, I HAS A FISH ITZ BOB
(modified from 1.1)
All LOLCODE programs must be opened with the command HAI
. HAI
should then be followed with the current LOLCODE language version number (1.2, in this case). There is no current standard behavior for implementations to treat the version number, though.
A LOLCODE file is closed by the keyword KTHXBYE
which closes the HAI
code-block.
(to be revisited and refined)
All variable scope, as of this version, is local to the enclosing function or to the main program block. Variables are only accessible after declaration, and there is no global scope.
(from 1.1)
Variable identifiers may be in all small or lowercase letters (or a mixture of the two). They must begin with a letter and may be followed only by other letters, numbers, and underscores. No spaces, dashes, or other symbols are allowed. Variable identifiers are CASE SENSITIVE – "cheezburger", "CheezBurger" and "CHEEZBURGER" would all be different variables.
(modified from 1.1)
To declare a variable, the keyword is I HAS A
followed by the variable name. To assign the variable a value within the same statement, you can then follow the variable name with ITZ <value>
.
Assignment of a variable is accomplished with an assignment statement, <variable> R <expression>
I HAS A VAR BTW VAR is null and untyped
VAR R "THREE" BTW VAR is now a YARN and equals "THREE"
VAR R 3 BTW VAR is now a NUMBR and equals 3
(updated from 1.1)
The variable types that LOLCODE currently recognizes are: strings (YARN), integers (NUMBR), floats (NUMBAR), and booleans (TROOF) (Arrays (BUKKIT) are reserved for future expansion.) Typing is handled dynamically. Until a variable is given an initial value, it is untyped (NOOB). ~~Casting operations operate on TYPE types, as well.~~
The untyped type (NOOB) cannot be implicitly cast into any type except a TROOF. A cast into TROOF makes the variable FAIL. Any operations on a NOOB that assume another type (e.g., math) results in an error.
Explicit casts of a NOOB (untyped, uninitialized) variable are to empty/zero values for all other types.
The two boolean (TROOF) values are WIN (true) and FAIL (false). The empty string (""), an empty array, and numerical zero are all cast to FAIL. All other values evaluate to WIN.
A NUMBR is an integer as specified in the host implementation/architecture. Any contiguous sequence of digits outside of a quoted YARN and not containing a decimal point (.) is considered a NUMBR. A NUMBR may have a leading hyphen (-) to signify a negative number.
A NUMBAR is a float as specified in the host implementation/architecture. It is represented as a contiguous string of digits containing exactly one decimal point. Casting a NUMBAR to a NUMBR truncates the decimal portion of the floating point number. Casting a NUMBAR to a YARN (by printing it, for example), truncates the output to a default of two decimal places. A NUMBAR may have a leading hyphen (-) to signify a negative number.
Casting of a string to a numerical type parses the string as if it were not in quotes. If there are any non-numerical, non-hyphen, non-period characters, then it results in an error. Casting WIN to a numerical type results in "1" or "1.0"; casting FAIL results in a numerical zero.
String literals (YARN) are demarked with double quotation marks ("). Line continuation and soft-command-breaks are ignored inside quoted strings. An unterminated string literal (no closing quote) will cause an error.
Within a string, all characters represent their literal value except the colon (:), which is the escape character. Characters immediately following the colon also take on a special meaning.
The colon may also introduce more verbose escapes enclosed within some form of bracket.
<hex>
) resolves the hex number into the corresponding Unicode code point.<var>
} interpolates the current value of the enclosed variable, cast as a string.<char name>
] resolves the <char name>
in capital letters to the corresponding Unicode normative name.Array and dictionary types are currently under-specified. There is general will to unify them, but indexing and definition is still under discussion.
The TYPE type only has the values of TROOF, NOOB, NUMBR, NUMBAR, YARN, and TYPE, as bare words. They may be legally cast to TROOF (all true except for NOOB) or YARN.
TYPEs are under current review. Current sentiment is to delay defining them until user-defined types are relevant, but that would mean that type comparisons are left unresolved in the meantime.
Mathematical operators and functions in general rely on prefix notation. By doing this, it is possible to call and compose operations with a minimum of explicit grouping. When all operators and functions have known arity, no grouping markers are necessary. In cases where operators have variable arity, the operation is closed with MKAY
. An MKAY
may be omitted if it coincides with the end of the line/statement, in which case the EOL stands in for as many MKAYs
as there are open variadic functions.
Calling unary operators then has the following syntax:
<operator> <expression1>
The AN
keyword can optionally be used to separate arguments, so a binary operator expression has the following syntax:
<operator> <expression1> [AN] <expression2>
An expression containing an operator with infinite arity can then be expressed with the following syntax:
<operator> <expr1> [[[AN] <expr2>] [AN] <expr3> ...] MKAY
The basic math operators are binary prefix operators.
SUM OF <x> AN <y> BTW +
DIFF OF <x> AN <y> BTW -
PRODUKT OF <x> AN <y> BTW *
QUOSHUNT OF <x> AN <y> BTW /
MOD OF <x> AN <y> BTW modulo
BIGGR OF <x> AN <y> BTW max
SMALLR OF <x> AN <y> BTW min
<x>
and <y>
may each be expressions in the above, so mathematical operators can be nested and grouped indefinitely.
Math is performed as integer math in the presence of two NUMBRs, but if either of the expressions are NUMBARs, then floating point math takes over.
If one or both arguments are a YARN, they get interpreted as NUMBARs if the YARN has a decimal point, and NUMBRs otherwise, then execution proceeds as above.
If one or another of the arguments cannot be safely cast to a numerical type, then it fails with an error.
Boolean operators working on TROOFs are as follows:
BOTH OF <x> [AN] <y> BTW and: WIN iff x=WIN, y=WIN
EITHER OF <x> [AN] <y> BTW or: FAIL iff x=FAIL, y=FAIL
WON OF <x> [AN] <y> BTW xor: FAIL if x=y
NOT <x> BTW unary negation: WIN if x=FAIL
ALL OF <x> [AN] <y> ... MKAY BTW infinite arity AND
ANY OF <x> [AN] <y> ... MKAY BTW infinite arity OR
<x>
and <y>
in the expression syntaxes above are automatically cast as TROOF values if they are not already so.
Comparison is (currently) done with two binary equality operators:
BOTH SAEM <x> [AN] <y> BTW WIN iff x == y
DIFFRINT <x> [AN] <y> BTW WIN iff x != y
Comparisons are performed as integer math in the presence of two NUMBRs, but if either of the expressions are NUMBARs, then floating point math takes over. Otherwise, there is no automatic casting in the equality, so BOTH SAEM "3" AN 3
is FAIL.
There are (currently) no special numerical comparison operators. Greater-than and similar comparisons are done idiomatically using the minimum and maximum operators.
BOTH SAEM <x> AN BIGGR OF <x> AN <y> BTW x >= y
BOTH SAEM <x> AN SMALLR OF <x> AN <y> BTW x <= y
DIFFRINT <x> AN SMALLR OF <x> AN <y> BTW x > y
DIFFRINT <x> AN BIGGR OF <x> AN <y> BTW x < y
If <x>
in the above formulations is too verbose or difficult to compute, don't forget the automatically created IT temporary variable. A further idiom could then be:
<expression>, DIFFRINT IT AN SMALLR OF IT AN <y>
Suggestions are being accepted for coherently and convincingly english-like prefix operator names for greater-than and similar operators.
An indefinite number of YARNs may be explicitly concatenated with the SMOOSH...MKAY
operator. Arguments may optionally be separated with AN
. As the SMOOSH
expects strings as its input arguments, it will implicitly cast all input values of other types to YARNs. The line ending may safely implicitly close the SMOOSH
operator without needing an MKAY
.
Operators that work on specific types implicitly cast parameter values of other types. If the value cannot be safely cast, then it results in an error.
An expression's value may be explicitly cast with the binary MAEK
operator.
MAEK <expression> [A] <type>
Where <type>
is one of TROOF, YARN, NUMBR, NUMBAR, or NOOB. This is only for local casting: only the resultant value is cast, not the underlying variable(s), if any.
To explicitly re-cast a variable, you may create a normal assignment statement with the MAEK
operator, or use a casting assignment statement as follows:
<variable> IS NOW A <type> BTW equivalent to
<variable> R MAEK <variable> [A] <type>
The print (to STDOUT or the terminal) operator is VISIBLE
. It has infinite arity and implicitly concatenates all of its arguments after casting them to YARNs. It is terminated by the statement delimiter (line end or comma). The output is automatically terminated with a carriage return (:)), unless the final token is terminated with an exclamation point (!), in which case the carriage return is suppressed.
VISIBLE <expression> [<expression> ...][!]
There is currently no defined standard for printing to a file.
To accept input from the user, the keyword is
GIMMEH <variable>
which takes YARN for input and stores the value in the given variable.
GIMMEH
is defined minimally here as a holdover from 1.0 and because there has not been any detailed discussion of this feature. We count on the liberal casting capabilities of the language and programmer inventiveness to handle input restriction. GIMMEH
may change in a future version.
A bare expression (e.g. a function call or math operation), without any assignment, is a legal statement in LOLCODE. Aside from any side-effects from the expression when evaluated, the final value is placed in the temporary variable IT
. IT
's value remains in local scope and exists until the next time it is replaced with a bare expression.
Assignment statements have no side effects with IT
. They are generally of the form:
<variable> <assignment operator> <expression>
The variable being assigned may be used in the expression.
Flow control statements cover multiple lines and are described in the following section.
The traditional if/then construct is a very simple construct operating on the implicit IT
variable. In the base form, there are four keywords: O RLY?
, YA RLY
, NO WAI
, and OIC
.
O RLY?
branches to the block begun with YA RLY
if IT
can be cast to WIN, and branches to the NO WAI
block if IT
is FAIL. The code block introduced with YA RLY
is implicitly closed when NO WAI
is reached. The NO WAI
block is closed with OIC
. The general form is then as follows:
<expression>
O RLY?
YA RLY
<code block>
NO WAI
<code block>
OIC
while an example showing the ability to put multiple statements on a line separated by a comma would be:
BOTH SAEM ANIMAL AN "CAT", O RLY?
YA RLY, VISIBLE "J00 HAV A CAT"
NO WAI, VISIBLE "J00 SUX"
OIC
The elseif construction adds a little bit of complexity. Optional MEBBE <expression>
blocks may appear between the YA RLY and NO WAI blocks. If the <expression>
following MEBBE
is WIN, then that block is performed; if not, the block is skipped until the following MEBBE
, NO WAI
, or OIC
. The full expression syntax is then as follows:
<expression>
O RLY?
YA RLY
<code block>
[MEBBE <expression>
<code block>
[MEBBE <expression>
<code block>
...]]
[NO WAI
<code block>]
OIC
An example of this conditional is then:
BOTH SAEM ANIMAL AN "CAT"
O RLY?
YA RLY, VISIBLE "J00 HAV A CAT"
MEBBE BOTH SAEM ANIMAL AN "MAUS"
VISIBLE "NOM NOM NOM. I EATED IT."
OIC
(modified from 1.1)
The LOLCODE keyword for switches is WTF?
. The WTF?
operates on IT
as being the expression value for comparison. A comparison block is opened by OMG
and must be a literal, not an expression. (A literal, in this case, excludes any YARN containing variable interpolation (:{var}
).) Each literal must be unique. The OMG
block can be followed by any number of statements and may be terminated by a GTFO
, which breaks to the end of the the WTF
statement. If an OMG
block is not terminated by a GTFO
, then the next OMG
block is executed as is the next until a GTFO
or the end of the WTF
block is reached. The optional default case, if none of the literals evaluate as true, is signified by OMGWTF
.
WTF?
OMG <value literal>
<code block>
[OMG <value literal>
<code block> ...]
[OMGWTF
<code block>]
OIC
COLOR, WTF?
OMG "R"
VISIBLE "RED FISH"
GTFO
OMG "Y"
VISIBLE "YELLOW FISH"
OMG "G"
OMG "B"
VISIBLE "FISH HAS A FLAVOR"
GTFO
OMGWTF
VISIBLE "FISH IS TRANSPARENT"
OIC
In this example, the output results of evaluating the variable COLOR
would be:
"R":
RED FISH
"Y":
YELLOW FISH
FISH HAS A FLAVOR
"G":
FISH HAS A FLAVOR
"B":
FISH HAS A FLAVOR
none of the above:
FISH IS TRANSPARENT
Loops are currently defined more or less as they were in the original examples. Further looping constructs will be added to the language soon.
Simple loops are demarcated with IM IN YR <label>
and IM OUTTA YR <label>
. Loops defined this way are infinite loops that must be explicitly exited with a GTFO break. Currently, the <label>
is required, but is unused, except for marking the start and end of the loop.
Immature spec – subject to change:
Iteration loops have the form:
IM IN YR <label> <operation> YR <variable> [TIL|WILE <expression>]
<code block>
IM OUTTA YR <label>
Where
A function is demarked with the opening keyword HOW IZ I
and the closing keyword IF U SAY SO
. The syntax is as follows:
HOW IZ I <function name> [YR <argument1> [AN YR <argument2> …]]
<code block>
IF U SAY SO
Currently, the number of arguments in a function can only be defined as a fixed number. The <argument>
s are single-word identifiers that act as variables within the scope of the function's code. The calling parameters' values are then the initial values for the variables within the function's code block when the function is called.
Currently, functions do not have access to the outer/calling code block's variables.
Return from the function is accomplished in one of the following ways:
FOUND YR <expression>
returns the value of the expression.GTFO
returns with no value (NOOB).IF U SAY SO
), the value in IT
is returned.A function of given arity is called with:
I IZ <function name> [YR <expression1> [AN YR <expression2> [AN YR <expression3> ...]]] MKAY
That is, an expression is formed by the function name followed by any arguments. Those arguments may themselves be expressions. The expressions' values are obtained before the function is called. The arity of the functions is determined in the definition.
Last modified 07 October 2024