Lexer

internal/lex. A hand-written byte-level UTF-8 lexer.

Token shape

type Token struct {
    Kind Kind        // TkInt, TkIdent, TkPlus, ...
    Lit  string      // raw lexeme
    Span diag.Span   // source location
}

Kind is an enum (internal/lex/token.go). Lit is the raw lexeme — for an IntLit like 1_000_i32 it is the literal text including underscores and suffix; semantic interpretation happens in the parser.

Categories

Category	Examples
Keywords	`fn`, `return`, `let`, `if`, `else`, `match`, `true`, `false`, `in`, `as`, `mut`
Identifiers	`foo`, `_bar`, `MyType`. (`for`, `while`, `break`, `continue` lex as identifiers — not keywords.)
Literals	`42`, `1_000`, `42_i32`, `'A'`, `3.14`, `"..."`, `true`
Operators	`+`, `.+`, `\|>`, `==`, `..`, `..=`, `=>`, `->`
Punctuation	`(`, `)`, `[`, `]`, `{`, `}`, `;`, `,`, `:`, `\|`

Notable design choices

Byte-level for ASCII, validated for non-ASCII inside string and char literals. This avoids runtime regex but accepts the full Unicode codepoint range inside '...'.
Line comments use #, not //. // is the divide-reduction operator (TkSlashSlash); -/ is subtract-reduction (TkMinusSlash). The line-comment lead-in moved to # in v0.14 so the operator surface could complete.
Block comments are nestable: /* /* nested */ still inside */. This is a deliberate divergence from C.
Spans are byte offsets into the original source, plus a 1-based line/column for diagnostics. The diagnostic renderer maps them back to the source line.

Why `for`, `while`, `break`, `continue` are not keywords

esque does not have C-style control flow; the loop primitives (tabulate, scan, iterate, iterate_until, each) replace them. To keep the surface small and let those identifiers serve as ordinary variable names, the lexer treats for, while, break, and continue as plain identifiers.

Token shape​

Categories​

Notable design choices​

Why for, while, break, continue are not keywords​

Token shape

Categories

Notable design choices

Why `for`, `while`, `break`, `continue` are not keywords