Skip to main content

Compile

The compiler is the last stage of the pipeline. It takes an AST, a symbol table, and a __future__ flag set, and produces a Code object. Internally it does three jobs that CPython splits across three source files (Python/codegen.c, Python/flowgraph.c, Python/assemble.c). gopy keeps them in one Go package because no caller crosses the boundary between them: codegen emits a flat instruction sequence, the flowgraph stage rewrites the sequence through a basic-block graph, and the assembler packs the result into the on-disk layout. A reader who is only interested in the last byte of bytecode can think of compile/ as one black box; a reader who is debugging a bad opcode emission or a wrong stack depth will want to know the boundary.

This page walks the three jobs in order, then describes the four output tables (code, consts, locations, exceptions) and the optimisation passes that run between codegen and assembly.

Where the code lives

gopy pathCPython sourceRole
compile/compiler.goPython/compile.c:L353 _PyAST_CompileTop-level Compile and unit driver
compile/codegen.goPython/codegen.c _PyCodegen_*Compiler struct, scope entry, dispatch
compile/codegen_stmt.goPython/codegen.c per-stmt visitorsModule, Interactive, Expr, Return, Pass
compile/codegen_stmt_control.goPython/codegen.c if/for/while/match dispatchControl flow shape
compile/codegen_stmt_funclike.goPython/codegen.c compiler_functionFunctionDef, AsyncFunctionDef, Lambda
compile/codegen_stmt_try.goPython/codegen.c compiler_tryTry, TryStar, exception setup
compile/codegen_stmt_with.goPython/codegen.c compiler_withWith, AsyncWith
compile/codegen_stmt_match.goPython/codegen.c compiler_matchPEP 634 structural matching
compile/codegen_class.goPython/codegen.c compiler_classClassDef, descriptor protocol setup
compile/codegen_expr_*.goPython/codegen.c per-expr visitorsNames, calls, containers, comprehensions
compile/codegen_addop.goPython/codegen.c ADDOP_* macrosaddOp, addOpI, addOpName, addConst
compile/codegen_fblock.goPython/codegen.c compiler_unit.fblockFrame block stack tracking
compile/codegen_annotations.goPython/codegen.c codegen_deferred_annotations_bodyPEP 649 annotation deferral
compile/flowgraph.goPython/flowgraph.c cfg_builderBasicBlock graph, public Optimize
compile/flowgraph_jumps.goPython/flowgraph.c label resolutionJump target resolution
compile/flowgraph_stackdepth.goPython/flowgraph.c calculate_stackdepthStack-depth dataflow
compile/flowgraph_except.goPython/flowgraph.c label_exception_targetsTry-region labelling
compile/flowgraph_passes.goPython/flowgraph.c optimize_*Peephole and graph optimisations
compile/assemble.goPython/assemble.c _PyAssemble_MakeCodeObjectFinal packing
compile/assemble_locations.goPython/assemble.c emit_location_infoPEP 626 location table
compile/assemble_exceptions.goPython/assemble.c emit_exception_tablePEP 657 exception table
compile/assemble_varint.goPython/assemble.c assemble_varintVarint encoder
compile/instrseq.goPython/compile.c instr_sequenceInstr and Sequence types
compile/code.goInclude/cpython/code.h PyCodeObjectCode object struct
compile/opcodes_gen.gogenerated from Lib/_opcode_metadata.pyOpcode constants and metadata

The compiler driver

compile/compiler.go:L30 Compile is the entry point. It takes an AST root (ast.Mod), a filename, an optimisation level, the __future__ flag set, and an already-built symbol table. It returns a single Code object.

// compile/compiler.go:L30
func Compile(mod ast.Mod, filename string, optimize int, ff future.Features, st *symtable.Table) (*Code, error) {
c := NewCompiler(filename, optimize, ff, st)
return c.compileMod(mod)
}

The Compiler struct holds a stack of Unit values, one per scope currently being compiled. A unit owns the instruction sequence being built, the constants pool, the variable name lists, the frame block stack (described below), and pointers to the parent unit and the symbol table Entry for this scope. When compileMod enters the module scope, it pushes a unit, emits instructions for the body, runs flowgraph optimisation, calls assemble, and pops the unit. The recursion handles nested scopes: a FunctionDef visitor pushes a new unit, recursively compiles the body, and pops. The Code object that comes out of the pop is stored in the parent unit's constants pool, where MAKE_FUNCTION will pick it up.

The unit-stack discipline keeps the per-scope state isolated. There is no shared mutable state between sibling scopes; nested-function compilation is just a recursive call.

Dispatching the AST

compile/codegen.go:L159 Compiler.Codegen is the per-scope entry point. It switches on the AST root kind (Module, Expression, Interactive, FunctionType) and walks the body with one of the per-mode visitors. Most of the work happens in the statement and expression visitors split across the codegen_stmt_*.go and codegen_expr_*.go files.

The statement visitor dispatch is fanned across three helpers, mirroring the symtable split: visitStmtDef for bindings and scope-introducers, visitStmtControl for control flow, visitStmtSimple for everything else. Each helper returns true if it handled the statement, otherwise the dispatcher falls through to the next. Expression visitors fan similarly.

Emitting an opcode is done through one of four helpers in codegen_addop.go. addOp(op, pos) adds an opcode with no oparg and a source position. addOpI(op, arg, pos) adds one with a numeric oparg. addOpName(op, pool, name, pos) looks up or interns a name in the relevant pool (co_names, co_varnames, co_freevars, co_cellvars) and adds an opcode that references it. addConst(val, pos) interns a Python constant into co_consts and adds a LOAD_CONST referencing it. The helpers handle the EXTENDED_ARG widening, the inline cache slots, and the location-information accumulation; the visitors stay free of that bookkeeping.

// compile/codegen_addop.go:L40
func (c *Compiler) addOpI(op Opcode, arg int, pos ast.Pos) {
c.unit.instrs = append(c.unit.instrs, Instr{
Op: op,
Arg: arg,
Pos: pos,
})
}

The instruction sequence is a flat slice. The flowgraph pass later rebuilds it as a graph; codegen does not have to think in basic blocks.

The frame block stack

Some statements (break, continue, return, raise, unwinding through with or try finally) need to know what control structures are currently open. The frame block stack in compile/codegen_fblock.go records that. When codegen enters a while body, it pushes a LOOP block with the start and end labels of the loop. When codegen enters a try body, it pushes a TRY block with the handler label. When codegen enters a with body, it pushes a WITH block with the cleanup label.

A break statement is emitted by walking the stack from the top until a LOOP block is found, emitting whatever cleanups intervening WITH and TRY blocks require, then jumping to the loop's end label. The stack guarantees that the cleanup emissions happen in the right order: with-finally cleanups before try-finally cleanups before the loop exit.

The same machinery underwrites return inside a try-finally. The return value is held on the stack across the finally block, which means codegen has to emit the right PUSH_EXC_INFO and POP_EXCEPT dances so that the runtime stack remains consistent.

From flat sequence to basic blocks

After codegen finishes emitting, the compiler hands the flat instruction sequence to compile/flowgraph.go:L58. The flowgraph package rebuilds the sequence as a graph of BasicBlock values. Each block is a maximal run of instructions ending in a jump, a return, a raise, or a fall-through to the next block. Blocks are stored in a slice; the graph edges are the per-block successor list.

The label resolution pass in compile/flowgraph_jumps.go is where the conversion happens. Codegen emits jumps with symbolic label targets (small integer ids). The pass walks the sequence, splits it at label boundaries, and resolves each jump's label to the destination block index. After this pass, every jump in the sequence has a numeric target relative to its position in the final byte stream, ready for the assembler.

Stack-depth analysis

The VM needs to know how much stack space a Code object can consume so it can preallocate the right size. The information is the co_stacksize field. Computing it is a dataflow problem on the basic-block graph: each opcode has a known stack effect (some opcodes have variable effects depending on their oparg, which compile/flowgraph_stackdepth.go:L26 getStackEffects handles), and the maximum depth across all reachable states is co_stacksize.

The pass runs a forward walk over the graph. It starts at the entry block with stack depth zero, propagates each instruction's effect to compute the depth after it, takes the maximum, and records the per-block post-depth. Successors receive the post-depth of their predecessor as their entry depth; if a block is reached by multiple predecessors with different entry depths the algorithm flags a bug (the depths must agree under the soundness assumption codegen enforces).

The analysis runs only on reachable blocks, so unreachable code (emitted by codegen but pruned by an earlier optimisation pass) does not contribute. CPython does the same; the algorithm has been stable for many releases.

Exception-region labelling

PEP 657 changed how the runtime finds an exception handler for each instruction. Instead of a stack of SETUP_FINALLY blocks that the interpreter pushes and pops, every Code object now carries a side table mapping instruction-range pairs to handler addresses. Building that table is compile/flowgraph_except.go:L40 labelExceptionTargets.

The pass walks the graph with a stack of open try regions. When it enters a try region, it pushes the handler label and the stack depth that the runtime should restore. When it exits the region, it pops. For every instruction in the region, it records the (instruction offset, region) pair. The output is a list of ranges that the assembler turns into a varint-encoded byte sequence.

The stack depth is the crucial part. When an exception fires mid-expression, the runtime has to unwind the value stack before jumping to the handler. The depth recorded in the exception table tells it how many values to pop. CPython is the same: Python/flowgraph.c:L885 label_exception_targets walks the same way.

Optimisation passes

compile/flowgraph_passes.go holds the passes that run between graph construction and assembly. The current set in gopy is a subset of CPython's: unreachable blocks are removed, redundant NOP instructions are dropped, jumps that target the next instruction are erased, jumps to unconditional jumps are folded through, and a handful of peephole rewrites are applied.

The passes are deliberately conservative. They run on the graph after exception-region labelling so that they do not have to reason about handler edges; they preserve every observable side effect, including the location-information shape; and they preserve the in-graph order of instructions, so a block that codegen built last stays last in the graph.

CPython 3.14 has many more passes than gopy presently ports. Super-instruction fusion and constant propagation across blocks are not yet in. The missing passes cost performance but not correctness: the same Code object would have been valid in 3.13 before those passes existed.

The assembler

compile/assemble.go:L33 Assemble flattens the graph back to a linear instruction list and emits the four parallel tables that make up a Code object.

The bytecode itself (co_code) is the linear instruction list packed as 16-bit code units (_Py_CODEUNIT). Each unit holds an opcode in the high byte and an oparg in the low byte. Opargs larger than 255 are widened with one or more EXTENDED_ARG prefix units. Inline cache slots, which the specializer fills in later, are reserved by emitting CACHE placeholder units after the opcode they belong to; the number per opcode is fixed by the opcode metadata in compile/opcodes_gen.go.

The constants pool (co_consts) holds every literal and every nested Code object. The assembler does not see the pool grow; codegen has already added each constant through addConst, which interns by identity for hashable values and by deep-equality for specific nested containers.

The location table (co_linetable, despite the name covering more than lines) is encoded by compile/assemble_locations.go:L1. The format is the PEP 626 varint encoding: each entry carries an instruction count and a line/column delta from the previous entry, with a small set of short forms for the common cases (entire instruction on the same line as the previous one, instruction with no line info). Decoding is the reverse, used at runtime by RAISE_VARARGS and the traceback machinery.

The exception table (co_exceptiontable) is encoded by compile/assemble_exceptions.go:L1. The format is also a varint list, with each entry holding a start offset, an end offset, a handler offset, the stack depth to restore, and a flag bit for whether lasti should be restored. The runtime binary-searches this table when an exception fires, finding the entry whose range covers the current instruction.

Each of the four tables is emitted in the same order as it would be marshalled to .pyc, which makes the marshal step in marshal/ a serialisation that preserves byte order rather than a rewrite.

// compile/assemble.go:L33
func Assemble(seq *Sequence, info Info, unit *Unit, filename string) (*Code, error) {
// resolve labels and widen opargs
code, err := pack(seq)
if err != nil {
return nil, err
}
locs, err := emitLocations(seq, info)
if err != nil {
return nil, err
}
excs, err := emitExceptions(info)
if err != nil {
return nil, err
}
return &Code{
Code: code,
Consts: unit.consts,
Names: unit.names,
Varnames: unit.varnames,
Freevars: unit.freevars,
Cellvars: unit.cellvars,
Linetable: locs,
Exceptiontable: excs,
Filename: filename,
Name: unit.name,
Argcount: unit.argcount,
Stacksize: info.MaxStackDepth,
Flags: unit.flags,
// ...
}, nil
}

Differences from CPython

  • Codegen, flowgraph, and assembler are one Go package. CPython keeps them in three C source files; the Go consolidation is legal because the package boundary is the right level of separation in Go.
  • Some advanced optimisation passes are not yet ported. The current set covers reachability, no-op removal, and basic jump folding. Super-instruction fusion and inter-block constant propagation are pending. The pages on specializer and optimizer cover the runtime tiers that partially compensate at execution time.
  • The assembler emits Code objects with the same field layout CPython produces, so a marshalled .pyc round-trips byte for byte between the two implementations.
  • The frame block stack is a slice of structs, not a linked list of malloc'd records. The behaviour is identical.

Reference

  • PEP 626. Precise line numbers for debugging.
  • PEP 634. Structural pattern matching.
  • PEP 649. Deferred evaluation of annotations.
  • PEP 657. Including fine-grained error locations in tracebacks.
  • PEP 695. Type parameter syntax.
  • symtable for the prerequisite scope analysis.
  • vm for what executes the Code object.
  • specializer for the runtime adaptive layer.