Compiler pipeline
import { Steps, Aside, Tabs, TabItem } from ‘@astrojs/starlight/components’;
Every compile runs through the same five stages. Most bugs show up in one of them specifically — knowing which one is where debugging starts.
-
Lexing —
SyslLexer.scalaTurns a source string into a token stream. The lexer is indentation-aware: it emits explicit
INDENT/DEDENT/NEWLINEtokens for Python-style block syntax, and it understands the inlinedo/thenforms that let you write single-line bodies.Literate (
.lsysl) files are handled here too — the lexer consumes only the indented code blocks and discards the Markdown prose. -
Parsing —
SyslParser.scalaRecursive-descent parser that produces the untyped AST in
SyslAST.scala. Handles every syntactic form the language has: module declarations, imports, attributes, generics, traits,match,if,for, closures, literate prose-interleaved source, and conditional compilation markers (left as AST nodes for the driver to resolve). -
Analysis —
SyslAnalyzer.scalaThe biggest and most interesting stage. Does:
- Name resolution (scopes, imports, module-qualified references).
- Type inference and type checking — including generic monomorphisation and trait-method dispatch.
- Exhaustiveness checks on
matchover tagged unions. - Contract typing (
require,ensure, loopvariant/invariant, structinvariant,old(...)). - Operator overloading resolution —
a + bon a user type becomes a call toAdd.add(a, b)resolved to a specificimplblock. - Emits the typed AST in
SyslTypedAST.scala.
-
Backend —
SyslInterpreter.scala/SyslTriscCodegen.scala/SyslLLVMCodegen.scalaAll three consume the typed AST. The interpreter walks it directly. The TRISC codegen emits assembly in the TRISC ISA and hands it to the assembler. The LLVM codegen emits
.lltext suitable forclang.The backends share a lowering style — struct returns as hidden pointers, strings as fat pointers, refs with a 16-byte header — but each has target-specific twists (r1 calling convention on TRISC,
i8*envs for closures on LLVM, etc.). -
Link — the host toolchain
For TRISC,
triscClilinks.toffiles with a user-supplied linker script into a bootable image. For LLVM,clangtakes over. For the interpreter, there is no link step — the typed AST is the executable.
Where each concern ends up
Section titled “Where each concern ends up”| Concern | Stage | File |
|---|---|---|
| Indentation blocks | Lexing | SyslLexer.scala |
| Operator precedence | Parsing | SyslParser.scala |
Literate .lsysl handling | Lexing | SyslLexer.scala |
Conditional #if | Driver | SyslDriver.scala |
| Type inference | Analysis | SyslAnalyzer.scala |
| Generic instantiation | Analysis | SyslAnalyzer.scala |
| Trait/impl dispatch | Analysis | SyslAnalyzer.scala |
match exhaustiveness | Analysis | SyslAnalyzer.scala |
| Contract lowering | Analysis + backend | SyslAnalyzer.scala + each codegen |
| String / ref layout | Backends | SyslTriscCodegen.scala, SyslLLVMCodegen.scala |
| Stack vs heap closures | Backends | same |
volatile semantics | LLVM backend | SyslLLVMCodegen.scala |
| Multi-module ordering | Driver | SyslDriver.scala |
.smeta cross-module cache | Driver + analyser | ModuleMeta.scala |
Tracing a compile
Section titled “Tracing a compile”sbt "syslCliJVM/run run hello.sysl --trace-analyzer"--trace-analyzer dumps every type decision the analyser makes. Use it when a generic
instantiation is going somewhere surprising.
sbt "syslCliJVM/run compile hello.sysl --emit asm -o hello.asm"The emitted assembly is human-readable. Every function is labelled with its mangled name;
every runtime call (malloc, __puts, contract traps) is an explicit symbol.
sbt "syslCliJVM/run compile hello.sysl --backend llvm --emit ll -o hello.ll"The emitted IR is normal LLVM. You can pipe it through opt -O2 -S to see what the
optimiser does with it.