Skip to main content

x86 backend

internal/backend/cpu/x86. Hand-written instruction selection, register allocation, and encoding for x86-64 Linux. No LLVM, no MC, no third-party encoders.

Instruction selection

isel.go. Tree-pattern matching on MIR instructions. Each MIR op has one or more emit sites that pick an x86 form based on operand shape and types. Highlights:

  • Element-wise float ops on f32 tensors are emitted as 256-bit AVX2 (vaddps, vmulps, etc.) when the run length is a multiple of 8, 128-bit SSE (addps, mulps) when it is a multiple of 4 but not 8, and a hybrid tail (SSE + scalar) for irregular shapes. Strip-mining is done at isel time so the encoder sees concrete instruction types.
  • Reductions (+/) emit horizontal-add chains: vhaddps/haddps to collapse SIMD lanes, plus a scalar tail.
  • OpRodataPtr emits a single lea sym(%rip), reg and the ELF writer fixes it up via R_X86_64_PC32.
  • OpCall dispatches separately for float-arg and int-arg forms (the System V AMD64 split).

Register allocation

regalloc.go. Linear-scan over MIR.

  • Caller-saved: RAX, RCX, RDX, RSI, RDI, R8R11 (the full System V caller-saved set). Free across calls only as scratch. The allocator's general-purpose pool is a subset of these — RCX and R8R11 — because RAX is reserved for the return value, RDX for high-half results / division, and RDI/RSI are reused for argument shuffling at call sites.
  • Callee-saved: RBX, R12R15 (plus RBP, which is reserved as the frame pointer and not in the allocator pool). Used for values live across calls.
  • SIMD pool: 15 slots, xmm1xmm15 (xmm0 is reserved for the float return value, per the System V ABI). When CEIR produces more simultaneously-live float values than the pool has slots, the allocator spills to stack slots — this is what TestFoldAdd16 exercises. xmm/ymm share the physical SIMD register file, so allocating one alias marks the other busy.

Encoder

encode.go. Emits raw bytes for each instruction form.

Every instruction form encodes:

  • REX prefix (if needed).
  • Optional opcode-extension bytes for VEX/AVX.
  • Opcode bytes.
  • ModR/M byte.
  • SIB byte (for memory operands with index).
  • Displacement / immediate.

The encoder is intentionally narrow: only the forms the codegen actually emits are implemented. A new instruction form is an explicit addition; there is no general assembler.

ABI

abi.go. System V AMD64.

ClassRegisters
Integer argsRDI, RSI, RDX, RCX, R8, R9
Float argsXMM0XMM7
Integer returnRAX
Float returnXMM0
Stack alignment16 bytes at every call boundary

Runtime intrinsics

runtime.go. Hand-written x86-64 implementations of print_i32, print_f32, and print_str. Each is a small block of bytes that does decimal/string formatting and a Linux write(2) syscall directly. The driver appends them to the user's .o only when an unresolved relocation refers to one of these names and the user has not defined a function with the same name. Each intrinsic returns its argument unchanged so calls compose in expression position; all three carry the @io effect.

Tests

  • cpu/x86/encode_test.go — encoder unit tests, byte-exact.
  • cpu/x86/isel_test.go — isel pattern-match tests.
  • cpu/x86/regalloc_test.go — register allocation, spills.
  • tests/e2e/* — end-to-end fixtures that compile and run.