The following characters are considered as blanks: space, newline, horizontal tabulation, carriage return, line feed and form feed. Blanks are ignored, but they separate adjacent identifiers, literals and keywords that would otherwise be confused as one single identifier, literal or keyword.
Comments are introduced by the two characters (*, with no intervening blanks, and terminated by the characters *), with no intervening blanks. Comments are treated as blank characters. Comments do not occur inside string or character literals. Nested comments are handled correctly.
ident: (letter |_
) {letter |0
...9
|_
|'
} letter:A
...Z
|a
...z
Identifiers are sequences of letters, digits, _ (the underscore character), and ' (the single quote), starting with a letter or an underscore. Letters contain at least the 52 lowercase and uppercase letters from the ASCII set. The current implementation also recognizes as letters all accented characters from the ISO 8859-1 (``ISO Latin 1'') set, and also allows an underscore _ as the first character of an identifier. All characters in an identifier are meaningful. The current implementation places no limits on the number of characters of an identifier.
integer-literal: [-
] {0
...9
}+ | [-
] (0x
|0X
) {0
...9
|A
...F
|a
...f
}+ | [-
] (0o
|0O
) {0
...7
}+ | [-
] (0b
|0B
) {0
...1
}+
An integer literal is a sequence of one or more digits, optionally preceded by a minus sign. By default, integer literals are in decimal (radix 10). The following prefixes select a different radix:
Prefix | Radix |
---|---|
0x, 0X | hexadecimal (radix 16) |
0o, 0O | octal (radix 8) |
0b, 0B | binary (radix 2) |
float-literal: [-
] {0
...9
}+ [.
{0
...9
}] [(e
|E
) [+
|-
] {0
...9
}+]
Floating-point decimals consist in an integer part, a decimal part and an exponent part. The integer part is a sequence of one or more digits, optionally preceded by a minus sign. The decimal part is a decimal point followed by zero, one or more digits. The exponent part is the character e or E followed by an optional + or - sign, followed by one or more digits. The decimal part or the exponent part can be omitted, but not both to avoid ambiguity with integer literals. The interpretation of floating-point literals that fall outside the range of representable floating-point values is undefined.
char-literal:'
regular-char'
|'
\
(\
|'
|n
|t
|b
|r
)'
|'
\
(0
...9
) (0
...9
) (0
...9
)'
Character literals are delimited by ' (single quote) characters. The two single quotes enclose either one character different from ' and \, or one of the escape sequences below:
Sequence | Character denoted |
---|---|
\\ | backslash (\) |
\' | single quote (') |
\n | newline (LF) |
\r | return (CR) |
\t | horizontal tabulation (TAB) |
\b | backspace (BS) |
\ddd | the character with ASCII code ddd in decimal |
string-literal:"
{string-character}"
string-character: regular-char |\
(\
|"
|n
|t
|b
|r
) |\
(0
...9
) (0
...9
) (0
...9
)
String literals are delimited by " (double quote) characters. The two double quotes enclose a sequence of either characters different from " and \, or escape sequences from the table below:
Sequence | Character denoted |
---|---|
\\ | backslash (\) |
\" | double quote (") |
\n | newline (LF) |
\r | return (CR) |
\t | horizontal tabulation (TAB) |
\b | backspace (BS) |
\ddd | the character with ASCII code ddd in decimal |
The current implementation places no restrictions on the length of string literals.
infix-symbol: (=
|<
|>
|@
|^
||
|&
|+
|-
|*
|/
|$
|%
) {operator-char} prefix-symbol: (!
|?
|~
) {operator-char} operator-char:!
|$
|%
|&
|*
|+
|-
|.
|/
|:
|<
|=
|>
|?
|@
|^
||
|~
Sequences of ``operator characters'', such as <=> or !!, are read as a single token from the infix-symbol or prefix-symbol class. These symbols are parsed as prefix and infix operators inside expressions, but otherwise behave much as identifiers.
The identifiers below are reserved as keywords, and cannot be employed otherwise:
and as assert asr begin class closed constraint do done downto else end exception external false for fun function functor if in include inherit land lazy let lor lsl lsr lxor match method mod module mutable new of open or parser private rec sig struct then to true try type val virtual when while withThe following character sequences are also keywords:
# & ' ( ) * , -> ? . .. .( .[ : :: := ; ;; <- = [ [| [< {< ] |] >] >} _ { | }
Lexical ambiguities are resolved according to the ``longest match'' rule: when a character sequence can be decomposed into two tokens in several different ways, the decomposition retained is the one with the longest first token.