Skip to main content

1714. Port CPython's Tools/cases_generator into gopy

Ground rule

Same rule as 1704 / 1705 / 1708 / 1712 / 1713. Vendor full subsystems, file by file. No partial slices, no name-only shims. This spec is the most aggressive application of that rule yet: every opcode in gopy, every cache layout, every stack effect, every dispatch arm, every uop body gets re-rooted onto CPython's own DSL. After this spec lands, vm/eval_*.go, specialize/*.go cache-access calls, compile/opcode_caches.go, and compile/opcodes_gen.go are all generated from the same inputs CPython compiles itself from. The hand-rolled dispatch loop, the hand-counted cache cells, and the hand-mirrored stack effect tables go away.

This spec pauses spec 1713. Byte-equality work resumes once 1714 lands, because every remaining 1713 row (codegen audit, flowgraph audit, marshal port, .pyc parity) is downstream of the same single-source-of-truth question: gopy can only match CPython's compiled output if both sides agree on opcode numbering, cache widths, and stack effects, and gopy's current copies of those tables are hand-maintained and divergent. The LOAD_GLOBAL cache cell-4 vs cell-1 bug that surfaced during 1713 P2 work is the canonical example: the specializer wrote past the cache and overwrote the next opcode's first byte; the VM read from the same out-of-bounds slot, so both sides "agreed" on a wrong layout for months. The bug only surfaced when an unrelated change exposed a codepath where the next instruction happened to be PUSH_NULL (opcode 33), got rewritten as RESERVED (17), and CALL popped a tuple iterator from the wrong stack slot. There is no test that catches this class of bug today. There is no design pattern in gopy that prevents it. There is in CPython, and it is Tools/cases_generator.

Goal

tools/regen-cases.sh
git diff --exit-code
# CI fails if any generated file is out of date

Concretely, the deliverable is a generator under tools/cases_generator/ that:

  1. Reads CPython 3.14.5's Python/bytecodes.c, Python/optimizer_bytecodes.c, and Include/internal/pycore_code.h from the vendored copy under tools/cases_generator/inputs/.
  2. Emits a fixed set of Go files under compile/, specialize/, and vm/ named *_gen.go.
  3. Is byte-for-byte reproducible: running tools/regen-cases.sh on a clean checkout leaves the tree unchanged.

Spec done = every opcode dispatch arm in gopy is generated, every cache access goes through a typed accessor whose layout is generated, every stack effect comes from the generated metadata table, every family / deopt relation is generated, and the hand-rolled equivalents are deleted. A reproducibility gate (test/gate/cases_generator_reproducibility_test.go) keeps it that way.

Why this spec exists

vm/eval_*.go is hand-rolled. specialize/*.go is hand-rolled. compile/opcode_caches.go is hand-rolled. They were ported one opcode at a time from CPython 3.14's Python/bytecodes.c and related headers, with each port translating both sides (specializer write, VM read, compile-side cache size, stack effect, family membership) separately. Five places per opcode that all have to agree.

CPython does not have this problem. Every opcode in CPython is defined exactly once in Python/bytecodes.c using a small DSL (inst, op, macro, family, pseudo, tier1, tier2, label). A pipeline of generators under Tools/cases_generator/ then emits:

CPython output fileWhat it drives
Python/generated_cases.c.hTier-1 dispatch bodies inside _PyEval_EvalFrameDefault
Python/executor_cases.c.hTier-2 uop bodies inside the executor loop
Include/internal/pycore_opcode_metadata.hStack effects, cache sizes, names, family table, deopt map
Include/internal/pycore_uop_metadata.hUop names, flags, output effects
Include/opcode_ids.hOpcode numeric IDs
Python/opcode_targets.hComputed-goto dispatch table
Lib/_opcode_metadata.pyPython-side mirror used by dis

The cache layout (_PyLoadGlobalCache { uint16_t counter; uint16_t index; uint16_t module_keys_version; uint16_t builtin_keys_version; } in Include/internal/pycore_code.h:117) is paired with the inst(LOAD_GLOBAL_MODULE, ...) body in Python/bytecodes.c via a _PyLoadGlobalCache *cache = ... declaration the generator parses out of the body. The generator enforces the pairing. Drift is a build error.

Hand-porting throws this out. We have, today, in gopy:

  • compile/opcode_caches.go declaring LOAD_GLOBAL: 4 codeunits.
  • specialize/load_global.go writing cells {2,3,4} (until 1714 P0 prework fixed it to {1,2,3}).
  • vm/eval_specialized_load_global.go reading cell 4 (until the same fix moved it to cell 1).
  • objects/dict.go exposing GetKeysVersion().
  • compile/opcodes_gen.go numbering LOAD_GLOBAL_MODULE independently of CPython's opcode_ids.h.

Five files. Five sources of truth. The same bug class is latent in LOAD_ATTR_INSTANCE_VALUE, LOAD_ATTR_SLOT, STORE_ATTR_INSTANCE_VALUE, CALL_PY_EXACT_ARGS, BINARY_SUBSCR_LIST_INT, and every other specialized arm. We have not seen the bugs yet because the inputs that trigger them have not been exercised. The 1712 specializer audit and the 1713 byte-equality gate are both going to surface them, one by one, the slow way. Or we can port the generator and delete the class.

The same argument applies to tier-2 uops. Spec 1712 hand-ported 14 of ~285 uops. Each one is a fresh translation from Python/optimizer_bytecodes.c. Without the generator, every uop is a fresh chance to drift from the tier-1 body of the same name. CPython's generator emits both sides from the same source.

CPython architecture

CPython's pipeline, with line counts taken from a fresh $HOME/cpython-314 clone at v3.14.5:

Python/bytecodes.c 5549 lines
├── 92 inst() tier-1 + tier-2 fused body
├── 145 op() tier-2-only or composable
├── 109 macro() composition of op + cache cells
├── 17 family() specialization families
├── 11 pseudo() compiler-only synthetics
└── 3 label() shared error-handler labels

Python/optimizer_bytecodes.c 1107 lines
├── op() bodies that override the tier-1 op() of the same name
└── used for tier-2 abstract interpretation (sym values, guards)

Include/internal/pycore_code.h cache structs per opcode family
Include/opcode_ids.h opcode numeric IDs (generated)
Include/internal/pycore_opcode_metadata.h per-opcode metadata (generated)

Tools/cases_generator/ 5811 lines
├── lexer.py 395 lines tokenizer for the C-with-DSL input
├── plexer.py 124 lines peekable lexer wrapper
├── parser.py 78 lines thin entry point on top of parsing.py
├── parsing.py 743 lines real parser; produces inst/op/macro/family AST
├── analyzer.py 1207 lines resolves macros, computes effects, walks bodies
├── stack.py 737 lines stack-effect tracker; emits push/pop sequences
├── cwriter.py 179 lines C output sink with indent tracking
├── generators_common.py 708 lines body emission shared between tier-1 / tier-2
├── tier1_generator.py 306 lines emits generated_cases.c.h
├── tier2_generator.py 228 lines emits executor_cases.c.h
├── optimizer_generator.py 244 lines emits abstract interpreter cases
├── opcode_id_generator.py 63 lines emits opcode_ids.h
├── opcode_metadata_generator.py 418 lines emits pycore_opcode_metadata.h
├── py_metadata_generator.py 95 lines emits Lib/_opcode_metadata.py
├── target_generator.py 94 lines emits opcode_targets.h (computed-goto)
├── uop_id_generator.py 79 lines emits pycore_uop_ids.h
└── uop_metadata_generator.py 98 lines emits pycore_uop_metadata.h

The full toolchain is 5811 lines of Python. The inputs are 6656 lines of C (bytecodes.c + optimizer_bytecodes.c). The C-side generated output across all targets is roughly 18000 lines.

Five concepts carry most of the weight, and the gopy port must preserve all of them:

Stack effects. Every inst() declaration looks like inst(LOAD_FAST, (-- value)) or inst(BINARY_OP, (lhs, rhs -- res)). The names and order on each side declare the stack inputs (popped) and outputs (pushed). The generator uses these to emit PEEK, STACK_SHRINK, STACK_GROW, and the local-variable initialization that gives the C body a typed binding for each input/output. stack.py tracks the running effect across a macro() composition and emits the minimum-cost push/pop sequence.

Cache cells. A macro(LOAD_GLOBAL) = unused/1 + counter/1 + globals_version/2 + builtins_version/1 + _LOAD_GLOBAL + _PUSH_NULL_CONDITIONAL declares the cache layout inline. The /N suffix declares how many codeunits each cell occupies. The generator computes the offset of each named cell, threads a _PyLoadGlobalCache *cache = (_PyLoadGlobalCache *)next_instr; at the top of the body, and rewrites cache->index references to the right offset in the codeunit stream. The struct in pycore_code.h matches by name and order. Drift is a compile error in CPython.

Families and deopt. family(LOAD_GLOBAL, INLINE_CACHE_ENTRIES_LOAD_GLOBAL) = { LOAD_GLOBAL_MODULE, LOAD_GLOBAL_BUILTIN }; declares that LOAD_GLOBAL's specialized arms are LOAD_GLOBAL_MODULE and LOAD_GLOBAL_BUILTIN, all sharing the same cache size. The generator emits the deopt map: a specialized arm that hits DEOPT_IF(cond) jumps back to the family parent without changing the cache layout. This is the mechanism gopy implements ad-hoc in specialize/deopt.go today.

DEOPT_IF / EXIT_IF / ERROR_IF. These three macros in the body desugar into the three exit paths the generator must emit: deopt back to the parent opcode, exit the tier-2 trace, or jump to the per-opcode error label. The generator knows which macro is legal in which context (DEOPT_IF only inside specialized arms, EXIT_IF only in tier-2, ERROR_IF anywhere) and emits the right restoration code (stack rollback, refcount adjustment) for each.

Tier-1 / tier-2 fusion. inst(NAME, ...) declares a body that serves both tiers. op(NAME, ...) declares a tier-2-only body or a tier-1-only body, depending on whether it's referenced by a macro(). optimizer_bytecodes.c overrides selected op() bodies with abstract-interpretation versions used during optimization. The generator emits the tier-1 case, the tier-2 uop body, and the optimizer-case body from the same source. Spec 1712's hand-rolled approach has all three drift independently.

gopy current state (2026-05-16)

Layergopy file(s)What it should generate fromStatus
Opcode IDscompile/opcodes_gen.goInclude/opcode_ids.hhand-rolled, generated-name notwithstanding. Source-of-truth: a Python script we run by hand, not the DSL.
Cache widthscompile/opcode_caches.gopycore_opcode_metadata.h _PyOpcode_Caches[]hand-rolled. 4 codeunits for LOAD_GLOBAL, etc.
Stack effectsinline in each eval_*.go armpycore_opcode_metadata.h _PyOpcode_num_popped/num_pushedhand-rolled, no cross-check.
Cache layouts (struct)specialize/cache.go SetCacheCell / CacheCellpycore_code.h _Py<Op>Cache structshand-rolled offsets. Cell index is a magic number at every call site.
Tier-1 dispatchvm/eval_simple.goPython/generated_cases.c.hhand-rolled switch statement.
Tier-1 specialized armsvm/eval_specialized_*.goPython/generated_cases.c.h (same cases)hand-rolled.
Tier-2 uopsvm/uops/*.go (per 1712)Python/executor_cases.c.h14 of ~285 hand-rolled.
Family tablespecialize/quicken.gopycore_opcode_metadata.h _PyOpcode_Caches+family arrayshand-rolled.
Deopt mapspecialize/deopt.gopycore_opcode_metadata.h _PyOpcode_Deopt[]hand-rolled.
Specializer skeletonsspecialize/*.go per-familyDSL family declarationshand-rolled.

Every row above is in scope. By the end of this spec, every "hand-rolled" becomes "generated", or the row gets explicitly carved out with a documented reason.

Files in scope

Sources of truth live under /Users/apple/cpython-314/ and are mirrored into tools/cases_generator/inputs/ so the generator runs hermetically. Every file below is ported in full, with // CPython: <file>:<line> <function> citations on the Go emitters and on any non-trivial bridging glue.

#CPython sourcegopy targetWhy
ATools/cases_generator/lexer.pytools/cases_generator/lexer.py (vendored verbatim)DSL tokenizer. No Go port. Run under host Python 3.14.
BTools/cases_generator/plexer.pytools/cases_generator/plexer.py (vendored)Peekable wrapper.
CTools/cases_generator/parser.py + parsing.pyvendoredDSL parser. Produces Inst, Op, Macro, Family, Pseudo AST nodes.
DTools/cases_generator/analyzer.pyvendoredResolves macros, computes stack effects, walks bodies for DEOPT_IF/ERROR_IF/EXIT_IF.
ETools/cases_generator/stack.pyvendoredStack-effect tracker; emits push/pop sequences.
FTools/cases_generator/cwriter.pyvendored and re-implemented as gowriter.pyC writer + a Go writer sharing the same indent-tracking and emit API.
GTools/cases_generator/generators_common.pyvendored, plus a go_generators_common.py companionBody emission. The companion handles Go-specific macro expansion (DEOPT_IFreturn 0, false, ERROR_IFreturn e.raise(err), etc).
HTools/cases_generator/opcode_id_generator.py+ gopy_opcode_id_generator.pyNew emitter targeting compile/opcode_ids_gen.go.
ITools/cases_generator/opcode_metadata_generator.py+ gopy_opcode_metadata_generator.pyNew emitter targeting compile/opcode_metadata_gen.go (replaces compile/opcode_caches.go).
JTools/cases_generator/uop_id_generator.py + uop_metadata_generator.py+ Go companionsTier-2 uop tables.
KTools/cases_generator/tier1_generator.py+ gopy_tier1_generator.pyEmits vm/eval_dispatch_gen.go: the dispatch switch + per-opcode body harness.
LTools/cases_generator/tier2_generator.py+ gopy_tier2_generator.pyEmits vm/eval_uops_gen.go: uop dispatch + body harness.
MTools/cases_generator/optimizer_generator.py+ gopy_optimizer_generator.pyEmits compile/optimizer_cases_gen.go for spec 1712's abstract interpreter.
NTools/cases_generator/target_generator.py(not ported)CPython-specific computed-goto. Go's switch is fine. Documented carve-out.
OTools/cases_generator/py_metadata_generator.py(vendored only)Emits Lib/_opcode_metadata.py; gopy already vendors that file via 1710 T5.1. No regeneration needed; we ship CPython's.
PPython/bytecodes.c (v3.14.5)tools/cases_generator/inputs/bytecodes.cThe single source. Frozen per CPython tag; bumped together with 1707 sync.
QPython/optimizer_bytecodes.c (v3.14.5)tools/cases_generator/inputs/optimizer_bytecodes.cTier-2 source.
RInclude/internal/pycore_code.htools/cases_generator/inputs/pycore_code.hCache struct definitions. Parsed by a new cache_struct_parser.py to emit specialize/cache_layouts_gen.go.

Output Go files (all *_gen.go, all under generator control, none hand-edited):

Output fileLines (estimated)Replaces
compile/opcode_ids_gen.go~600compile/opcodes_gen.go
compile/opcode_metadata_gen.go~1500compile/opcode_caches.go
compile/optimizer_cases_gen.go~2500parts of vm/uops/*.go (abstract interp)
specialize/cache_layouts_gen.go~400implicit layout knowledge across specialize/*.go
specialize/family_gen.go~200specialize/quicken.go family table + specialize/deopt.go map
vm/eval_dispatch_gen.go~4000core of vm/eval_simple.go + vm/eval_specialized*.go
vm/eval_uops_gen.go~3000core of vm/uops/*.go

Phase index

Each phase ports one block end to end. Status lives on the Checklist at the bottom of this spec, mirrored per row here. The phase order is chosen so that every phase ends with a green CI, including phases where the generator is partially wired: each emitter ships with a parity test that diffs its output against the hand-rolled file it will eventually replace, and only flips the hand-rolled file's role to "fallback" when the parity test goes green.

PhaseBlockGateStatus
0Vendor Tools/cases_generator/ verbatim. Mirror inputs (bytecodes.c, optimizer_bytecodes.c, pycore_code.h) under Tools/cases_generator/inputs/ at the 3.14.5 hash. Add Tools/regen-cases/ (Go driver) that invokes the generators against our vendored inputs and reproduces CPython's own outputs into a scratch dir for diffing.upstream reproducibility: regenerating CPython's 9 generator outputs from our vendored inputs matches the files in $HOME/cpython-314 byte for byte (header lines excluded)DONE (CI pending)
1Output abstraction. Port cwriter.py into a gowriter.py sibling that shares the indent/scope API but emits Go syntax. Implement go_generators_common.py with macro→Go bindings for the constant macros (PyStackRef_FromPyObject*, PyStackRef_AsPyObject*, STACK_SHRINK, STACK_GROW, PEEK, POKE, etc).unit-test corpus: 30 hand-written macro snippets emit known-good GoDONE (corpus at 20/30; remaining 10 stage with Phase 5 op signatures)
2Metadata + opcode-id emitters. Ship gopy_opcode_id_generator.py and gopy_opcode_metadata_generator.py. Output compile/opcode_ids_gen.go + compile/opcode_metadata_gen.go. Parity test: the generated tables equal compile/opcodes_gen.go + compile/opcode_caches.go for every opcode currently in gopy. Once green, the hand-rolled files get deleted.go test ./compile -run TestOpcodeMetadataParity green; deletion landsDONE (2.1-2.3; 2.4 deletion pending)
3Cache-layout emitter. New cache_struct_parser.py reads pycore_code.h, emits specialize/cache_layouts_gen.go with typed accessors ((*LoadGlobalCache).Index, .ModuleKeysVersion, etc) backed by the codeunit slice. Migrate every SetCacheCell / CacheCell call site in specialize/*.go and vm/eval_specialized_*.go to typed accessors. The LOAD_GLOBAL cell-1 vs cell-4 bug class becomes a compile error.every specialize/* + eval_specialized_* file builds; existing tests green; one new test (TestCacheLayoutTypedAccess) asserts the struct sizes match _PyOpcode_Caches[]TODO
4Family + deopt emitter. Generator emits specialize/family_gen.go carrying the family table (map[Opcode][]Opcode), deopt map (Opcode→Opcode parent), and per-family cache-size guard. Replace specialize/quicken.go family literal + specialize/deopt.go map. Parity test: generated tables equal current hand-rolled ones for every opcode we specialize today.parity test green; deletion landsDONE (8abf069)
5Tier-1 harness emitter. Generator emits vm/eval_dispatch_gen.go: the switch statement, per-opcode prologue (stack peek to typed locals), epilogue (stack push of typed outputs, cache advance), deopt path, error path. The body itself remains in hand-written Go: each opcode has a function op<NAME>(e *evalState, oparg uint32, in <inputs>) (out <outputs>, err error) whose signature is derived from the DSL and enforced by the generator. Any hand-written function whose signature diverges from the DSL stack effect is a build error.every opcode in gopy currently routed through vm/eval_simple.go is now routed through vm/eval_dispatch_gen.go; full go test ./... greenTODO
6Specialized arms. Same harness as Phase 5 but generates the specialized cases (LOAD_GLOBAL_MODULE etc), wires DEOPT_IF to a generated return deoptTo<PARENT>(e) shim. Hand-written specializer fast paths in vm/eval_specialized_*.go are reduced to per-opcode body functions; the cache decode, deopt branch, and cache advance live in the generator output.every vm/eval_specialized_*.go file shrinks to opcode bodies only; the LOAD_GLOBAL bug's regression test (a fixture that stresses cache cell boundaries) is addedTODO
7Tier-2 uop harness. gopy_tier2_generator.py emits vm/eval_uops_gen.go. The 14 uops 1712 hand-ported are re-rooted onto generated harness; remaining ~270 uops become trivial Go-body inserts. Parity test: for every uop name shared with tier-1 (e.g. _LOAD_FAST), the tier-1 body and the tier-2 uop body call into the same opcode-body function.tier-2 trace executor runs the same 1712 P2 microbenchmark with hand-rolled and generated uops side by side; perf within 2%, results identicalTODO
8Body translation pilot. Pick 10 trivial opcodes (NOP, POP_TOP, POP_TOP_LOAD_CONST_INLINE_BORROW, LOAD_FAST, LOAD_FAST_BORROW, STORE_FAST, LOAD_CONST, RETURN_VALUE, RESUME_CHECK, END_FOR). Add a C-body→Go-body translator under tools/cases_generator/body_translator.py that handles the constrained subset of C these bodies use. Generated Go bodies replace the hand-written ones; CI must stay green.10 opcodes have zero hand-written Go; CI green; size of the body-translator subset documented in tools/cases_generator/SUBSET.mdTODO
9Body translation scale-up. Translate every remaining opcode body. Each opcode that survives translation has its hand-written Go body deleted. Opcodes that the translator cannot handle (calls into runtime helpers gopy spells differently, refcount idioms gopy doesn't have because Go is GC'd) stay hand-written but with a generator-emitted stub asserting the signature.hand-written opcode bodies count drops below 30 (escape hatches only, documented per opcode); reproducibility gate greenTODO
Gatetools/regen-cases.sh && git diff --exit-code runs in CI. Any drift between source DSL and emitted Go fails the build.gate greenTODO

Phase 0 — vendor the generator

Bring Tools/cases_generator/ into the gopy repo under tools/cases_generator/ verbatim. The directory layout mirrors CPython's: lexer.py, plexer.py, parser.py, parsing.py, analyzer.py, stack.py, cwriter.py, generators_common.py, and the eight per-target emitters. No edits. The vendored copy carries a header comment naming the CPython commit it was pulled from, identical to how stdlib/ files name their source.

Inputs: Python/bytecodes.c, Python/optimizer_bytecodes.c, Include/internal/pycore_code.h get mirrored under tools/cases_generator/inputs/. Same pin: CPython 3.14.5. Spec 1707 (CPython 3.14.x sync) is the upstream rollup; this spec adds three rows to 1707's checklist for these inputs.

Driver: tools/regen-cases.sh is a thin bash wrapper that exports PYTHONPATH=tools/cases_generator and invokes each generator with the right input files and output paths. In Phase 0 the script emits only into a scratch directory under /tmp and runs diff against CPython's actual output in $HOME/cpython-314/Python/ and $HOME/cpython-314/Include/internal/. A green diff proves the toolchain is wired correctly before any Go work starts.

The Phase 0 gate runs in CI: tools/regen-cases.sh --check-upstream regenerates the C files into /tmp, diffs against the vendored CPython tree, fails on any divergence. This catches accidental edits to the vendored generator and the case where a CPython 3.14 patch release moves the inputs without touching the generator.

StepStatusCommit
Vendor Tools/cases_generator/ under Tools/cases_generator/DONE-
Mirror bytecodes.c, optimizer_bytecodes.c, pycore_code.h under Tools/cases_generator/inputs/DONE-
Tools/regen-cases/ (Go driver) invokes each upstream generator into a scratch dirDONE-
go run ./Tools/regen-cases --check-upstream diff-clean against CPython 3.14.5 generated filesDONE-
CI job cases-generator-upstream-parity greenTODO-

Phase 1 — output abstraction

CPython's cwriter.py is a 179-line indent-tracking C-syntax sink. It exposes emit(text), start_line(), block() (context manager for {...}), set_position(), set_lineno(). Generators write to a CWriter and get well-formatted C with #line directives.

Phase 1 introduces tools/cases_generator/gowriter.py with the same surface area, emitting Go. The block() context emits { / } the same way; start_line() honors gofmt-friendly indentation; set_lineno() emits //line directives keyed to the DSL source location so that runtime panics in generated code point back to Python/bytecodes.c.

A second file, go_generators_common.py, mirrors generators_common.py but binds DSL macros to Go expressions. The constant macros (the ones whose expansion does not depend on the surrounding stack effect) are the Phase 1 deliverable:

DSL macroGo expansion
PyStackRef_AsPyObjectBorrow(r)r.AsObject()
PyStackRef_FromPyObjectNew(o)stackref.FromObject(o)
PyStackRef_FromPyObjectImmortal(o)stackref.FromObjectImmortal(o)
PyStackRef_IsNull(r)r.IsNull()
STACK_SHRINK(n)(handled by the harness, not the body)
STACK_GROW(n)(handled by the harness)
PEEK(i)(handled by the harness)
POKE(i, v)(handled by the harness)
Py_INCREF(o) / Py_DECREF(o)(no-op; Go is GC'd)
Py_XDECREF(o)(no-op)
JUMPBY(n)e.jumpBy(n)
next_instre.f.NextInstr
_PyFrame_GetCode(frame)e.f.Code
frame->localsplus[i]e.f.Locals[i]
tstatee (the evalState)
opargoparg (passed as parameter)
DEOPT_IF(cond)if cond { return 0, false } (specialized arms)
EXIT_IF(cond)if cond { return e.tier2Exit() } (tier-2 only)
ERROR_IF(cond, label)if cond { return e.<label>() }
DECREF_INPUTS()(no-op; harness clears stack refs)
DEAD(name)(no-op; informational)

The Phase 1 corpus is 30 hand-written macro snippets covering each binding above. The test is mechanical: gowriter.py emits the binding, the result compiles, the result matches a checked-in golden file. No real opcode bodies translated yet.

StepStatusCommit
Tools/cases_generator/gowriter.py landsDONE-
Tools/cases_generator/go_generators_common.py lands with the binding table aboveDONE-
30-snippet golden-file corpus under Tools/cases_generator/testdata/snippets/PARTIAL (20/30)-
go test ./Tools/regen-cases -run TestSnippetParity (Go harness shelling out to Python) greenDONE-

Phase 2 — opcode IDs + metadata

Two Go emitters land. gopy_opcode_id_generator.py walks the analyzer's opcode list and emits compile/opcode_ids_gen.go:

// Code generated by tools/cases_generator. DO NOT EDIT.
// Source: Python/bytecodes.c (CPython 3.14.5)
package compile

const (
NOP Opcode = 0
RESERVED Opcode = 17
LOAD_FAST Opcode = 85
LOAD_GLOBAL Opcode = 91
LOAD_GLOBAL_MODULE Opcode = 158
LOAD_GLOBAL_BUILTIN Opcode = 159
// ...
)

var OpcodeName = map[Opcode]string{ ... }

gopy_opcode_metadata_generator.py emits compile/opcode_metadata_gen.go:

package compile

// CacheSize is the number of codeunits the inline cache occupies
// for this opcode. Generated from family() declarations and the
// macro() cache-cell list in Python/bytecodes.c.
var CacheSize = map[Opcode]int{
LOAD_GLOBAL: 4,
LOAD_GLOBAL_MODULE: 4,
LOAD_GLOBAL_BUILTIN: 4,
LOAD_ATTR: 9,
// ...
}

// StackEffect carries the popped/pushed counts derived from the
// DSL stack effect declaration. Both counts may depend on oparg
// for variadic opcodes (BUILD_TUPLE, CALL, etc); in that case
// StackEffect.PoppedFn / PushedFn is non-nil.
var StackEffect = map[Opcode]Effect{ ... }

// Family lists the specialized arms of each adaptive opcode.
var Family = map[Opcode][]Opcode{
LOAD_GLOBAL: {LOAD_GLOBAL_MODULE, LOAD_GLOBAL_BUILTIN},
// ...
}

// Deopt is the inverse: every specialized arm maps to its family
// parent.
var Deopt = map[Opcode]Opcode{
LOAD_GLOBAL_MODULE: LOAD_GLOBAL,
LOAD_GLOBAL_BUILTIN: LOAD_GLOBAL,
// ...
}

The parity test (compile/opcode_metadata_parity_test.go) asserts that the generated maps cover every key currently in compile/opcode_caches.go with the same value. When green, the hand-rolled file gets deleted in the same commit; the generated file becomes the only source.

StepStatusCommit
tools/cases_generator/gopy_opcode_id_generator.pyDONE78e434b
tools/cases_generator/gopy_opcode_metadata_generator.pyDONEthis commit
compile/opcode_ids_gen.go + compile/opcode_metadata_gen.go checked inDONEthis commit
Parity test greenDONE (3 tests; YIELD_VALUE escapes-flag delta logged, generator wins)this commit
compile/opcode_caches.go deleted; references redirectedTODO-

Phase 3 — typed cache layouts

The biggest single bug-class reduction in this spec. Today, specialize.SetCacheCell(code, instr, 1, idx) writes a magic number to a magic offset. The relationship between cell number and the meaning of that cell lives in a comment, sometimes wrong.

Phase 3 ships tools/cases_generator/cache_struct_parser.py. It reads Include/internal/pycore_code.h, parses each _Py<Op>Cache struct, emits specialize/cache_layouts_gen.go:

// Code generated by tools/cases_generator. DO NOT EDIT.
// Source: Include/internal/pycore_code.h (CPython 3.14.5)
package specialize

import "github.com/tamnd/gopy/compile"

// LoadGlobalCache mirrors Include/internal/pycore_code.h:117
// struct _PyLoadGlobalCache. Each codeunit cell maps to one
// uint16 field. Field offsets are checked at init time against
// compile.CacheSize[compile.LOAD_GLOBAL].
type LoadGlobalCache struct {
code []byte
instr int
}

func LoadGlobalCacheAt(code []byte, instr int) LoadGlobalCache { ... }
func (c LoadGlobalCache) Counter() uint16 { ... }
func (c LoadGlobalCache) SetCounter(v uint16) { ... }
func (c LoadGlobalCache) Index() uint16 { ... }
func (c LoadGlobalCache) SetIndex(v uint16) { ... }
func (c LoadGlobalCache) ModuleKeysVersion() uint16 { ... }
func (c LoadGlobalCache) SetModuleKeysVersion(v uint16) { ... }
func (c LoadGlobalCache) BuiltinKeysVersion() uint16 { ... }
func (c LoadGlobalCache) SetBuiltinKeysVersion(v uint16) { ... }

Every call site in specialize/*.go and vm/eval_specialized_*.go migrates to typed access. The specializer writes cache.SetIndex(uint16(idx)). The VM reads cache.Index(). Cell-4 vs cell-1 cannot happen: there is no cell 4, the struct has four fields and the type system enforces the mapping.

A new test (specialize/cache_layout_size_test.go) asserts that the codeunit size implied by each typed struct matches compile.CacheSize[op] for the family parent of that struct. A struct that overflows the family's reserved size is a test failure.

StepStatusCommit
tools/cases_generator/cache_struct_parser.pyDONEthis commit
specialize/cache_layouts_gen.go lands with every _Py<Op>Cache typedDONEthis commit
Migrate specialize/load_global.go + vm/eval_specialized_load_global.go to typed accessTODO-
Migrate every other specialize/*.go + vm/eval_specialized_*.goTODO-
specialize/cache.go SetCacheCell / CacheCell deleted (no callers)TODO-
Cache-layout size test greenDONEthis commit

Phase 4 — family + deopt

specialize/family_gen.go carries:

// Code generated by tools/cases_generator. DO NOT EDIT.
// Source: family() declarations in Python/bytecodes.c.
package specialize

import "github.com/tamnd/gopy/compile"

var Family = map[compile.Opcode][]compile.Opcode{
compile.LOAD_GLOBAL: {
compile.LOAD_GLOBAL_MODULE,
compile.LOAD_GLOBAL_BUILTIN,
},
// ... 16 more
}

var DeoptParent = map[compile.Opcode]compile.Opcode{
compile.LOAD_GLOBAL_MODULE: compile.LOAD_GLOBAL,
compile.LOAD_GLOBAL_BUILTIN: compile.LOAD_GLOBAL,
// ...
}

Existing hand-rolled equivalents in specialize/quicken.go and specialize/deopt.go shrink to consumers of these tables. The parity test asserts that for every opcode in either side, the hand-rolled and generated versions agree. Once green, the literal tables are deleted; the consumers stay.

StepStatusCommit
gopy_family_generator.py (or fold into Phase 2's metadata emitter)DONE8abf069
specialize/family_gen.go landsDONE8abf069
specialize/quicken.go + specialize/deopt.go consume generated tablesDONE8abf069
Parity test green; old literals deletedDONE8abf069

Phase 5 — tier-1 dispatch harness

The harness owns:

  1. The dispatch switch keyed on opcode.
  2. Per-opcode prologue: peek stack inputs into typed locals named per the DSL declaration.
  3. Per-opcode epilogue: push outputs, advance next_instr by 1 + CacheSize[op].
  4. Error path: every ERROR_IF(cond, label) becomes a generated helper call.
  5. Deopt path (for specialized arms): every DEOPT_IF(cond) becomes a return that signals fallback to the unspecialized arm.

The body is not generated in this phase. Each opcode has a hand-written Go function whose signature is fully determined by the DSL:

// Generator-derived signature:
// inst(LOAD_FAST, (-- value))
// produces:
func opLOAD_FAST(e *evalState, oparg uint32) (value stackref.Ref, err error) {
return e.f.Locals[oparg], nil
}

// inst(BINARY_OP, (lhs, rhs -- res))
// produces:
func opBINARY_OP(e *evalState, oparg uint32, lhs, rhs stackref.Ref) (res stackref.Ref, err error) {
...
}

The generator emits a stub per opcode at the bottom of vm/eval_dispatch_gen.go:

//go:linkname op<NAME> github.com/tamnd/gopy/vm.op<NAME>
var _ = op<NAME> // build error if op<NAME> is missing or signature drifts

A missing body or a wrong signature is a build error, not a runtime error. This is the entire point of the spec: the same class of bug that took down 1713 P2 work for half a day cannot exist by the end of Phase 5.

The harness also drives the cache: a specialized arm gets cache := LoadGlobalCacheAt(code, instr) injected before the body, so the body never reaches for raw codeunits.

StepStatusCommit
gopy_tier1_generator.py lands (as Tools/bytecodes_gen Go emitter)DONE(prior phase)
vm/eval_dispatch_gen.go covers every unspecialized opcode (skeleton; bodies pending Phase 8)PARTIAL (107 arms, bodies stubbed)this commit
vm/eval_simple.go shrinks to evalLoop scaffolding only (frame setup, exit handling)PARTIAL (NOP, POP_TOP routed)this commit
go test ./vm greenDONEthis commit
CPython-parity harness (Tools/bytecodes_gen/cpython_parity_test.go) lifts Lib/test/test_generated_cases.py fixtures and prints rolling coverageDONE (5 / 10 fixtures translate today)f97a926
Bytecodes.c coverage gauge (Tools/bytecodes_gen/cpython_coverage_test.go) walks every inst() in CPython 3.14.5's Python/bytecodes.c and reports the bail histogramDONE (14 / 118 inst() bodies translate today)a93336d

CPython-parity gate

Tools/bytecodes_gen/cpython_parity_test.go is the spec's authority on action-translator faithfulness. Each fixture is a verbatim copy of an (input, output) pair from Lib/test/test_generated_cases.py, wrapped in BEGIN/END markers and fed through ParseBytecodes → AnalyzeInst → TranslateBody. Each row carries:

  • bail=true while the translator falls back to a panic-stub. The harness asserts the fallback note's prefix stays stable so a drift in error wording shows up as a test failure rather than a silent regression.
  • want=[...substr...] once the translator handles the shape. The harness asserts the rendered Go body contains every substring.

TestCPythonParityFixtures logs coverage: PASS / N fixtures translate (bail=B) so the porting auto-flow can read progress without parsing test output structure. Coverage growth is monotonic: a fixture never moves from bail=false back to bail=true. Rows are never removed; when CPython retires a test we mirror the deletion in a separate commit so blame stays honest.

Bytecodes.c coverage gauge

Tools/bytecodes_gen/cpython_coverage_test.go is the complementary, exhaustive gauge: it walks every inst() in CPython 3.14.5's Python/bytecodes.c, runs each through the full pipeline (ParseBytecodes → AnalyzeInst → TranslateBody), and groups the bail reasons. The headline number is N / total inst() bodies translate; the bail histogram (bail (count) reason names...) exposes which translator extension yields the most leverage.

A hard floor (const minTranslates) in the test refuses to let the count regress. Bump it (never down) when a translator change flips more bodies; the porting auto-flow reads that constant to know progress without scraping logs.

Caveat: the gauge tracks parser coverage — a body "translates" when no stage rejects it. Compile-correctness of the emitted Go is verified separately by the strict dispatchGenSupported whitelist, which gates which opcodes route through dispatchGen in the live eval loop. The two layers measure different things on purpose: the gauge tells us "how much of bytecodes.c does the translator parse without bailing", the whitelist tells us "which opcodes have we audited end-to-end and run in production".

Migration progress

The harness routes every opcode through one of three layers, in order: generated (dispatchGen, gated by dispatchGenSupported), hand-written staging (dispatchHandwritten), and the legacy trySimple panel. The goal is to flip every opcode into the generated column. Each row migrates exactly once. Update these tables whenever an opcode moves.

Generated (dispatchGen)

These arms come straight out of Tools/bytecodes_gen and the action translator. The whitelist in vm/dispatch_gen_whitelist.go controls which ones the dispatcher actually consults; an opcode lands here only after its generated body has been audited byte-equivalent to the prior arm.

Each row carries a Status (generated once it lands in dispatchGen and the whitelist; staging while still in dispatchHandwritten; legacy while still in trySimple and friends) and a Commit stamp so the porting auto-flow can diff the tables against the tree without re-reading every panel.

OpcodeStatusCommitTranslator shapeNotes
NOPgenerated71b1fd1empty bodytrivial
POP_TOPgenerated71b1fd1PyStackRef_CLOSE(value)exercises stack-ref close
JUMP_FORWARDgenerated0d54073JUMPBY(oparg)body-driven terminator
PUSH_NULLgenerated0d47064output = PyStackRef_NULLoutput-assignment statement
LOAD_FASTgeneratedb842a4aoutput = PyStackRef_DUP(GETLOCAL(oparg))GETLOCAL rvalue + Dup
LOAD_FAST_BORROWgeneratedb842a4asame body as LOAD_FASTborrow collapses under Go GC
LOAD_FAST_AND_CLEARgeneratedb842a4aLOAD_FAST plus GETLOCAL(oparg) = PyStackRef_NULLGETLOCAL lvalue
STORE_FASTgeneratedb842a4a_PyStackRef tmp = GETLOCAL(oparg); GETLOCAL(oparg) = value; PyStackRef_XCLOSE(tmp)C-local decl + lvalue
JUMP_BACKWARD_NO_INTERRUPTgenerated337d126JUMPBY(-oparg)shares JUMPBY body with JUMP_FORWARD; JUMP_BACKWARD proper stays handwritten for breaker poll
END_SENDgenerated8ac6d1cval = value; DEAD(value); PyStackRef_CLOSE(receiver)bit-equivalent to handwritten body in eval_simple.go
LOAD_BUILD_CLASSgenerated55440dcint err = PyMapping_GetOptionalItem(BUILTINS(), &_Py_ID(__build_class__), &bc_o) + NameError when absentfirst Bucket B flip; lit _PyErr_SetString payload now flows through setPendingErr
SETUP_ANNOTATIONSgenerated07aa060LOCALS() + PyMapping_GetOptionalItem(LOCALS(), &_Py_ID(__annotations__), &ann_dict) + PyDict_New() fallback + PyObject_SetItem(LOCALS(), &_Py_ID(__annotations__), ann_dict)second Bucket B flip; PyDict_New registered as expression-side helper; EvalCode now defaults f.Locals = globals for module frames so LOCALS() matches CPython at module scope
LOAD_FROM_DICT_OR_GLOBALSgenerated40a53e3GETITEM(FRAME_CO_NAMES, oparg) + _PyDict_LoadGlobal cascadeA6 helper-call vocabulary registered the dict-globals lookup
LOAD_SMALL_INTgenerateda7a4f7fPyStackRef_FromPyObjectBorrow(_PyLong_GetSmallInt(oparg))small-int constant pool
LOAD_LOCALSgenerateda7a4f7fLOCALS() lifted into Go via e.frame.Locals()Bucket B LOCALS() shim
UNARY_NEGATIVEgeneratede2c5275PyNumber_Negative(value)objects.NumberNegativeBucket B helper
UNARY_INVERTgeneratede2c5275PyNumber_Invert(value)objects.NumberInvertBucket B helper
UNARY_NOTgeneratede2c5275int err = PyObject_IsTrue(...)objects.IsTruthy; output is PyStackRef_True/Falseshares IsTruthy plumbing with POP_JUMP_IF
LIST_APPENDgeneratedf658e40int err = _PyList_AppendTakeRef(list, v)A1 sized-input flip
SET_ADDgeneratedf658e40int err = PySet_Add(set, v)A1 sized-input flip
MAP_ADDgeneratedf658e40int err = _PyDict_SetItem_Take2(dict, key, value)A1 sized-input flip
DELETE_SUBSCRgeneratedf658e40int err = PyObject_DelItem(container, sub)objects.DelItemA2 int-local + helper port
GET_LENgenerated44eff3dPy_ssize_t len_i = PyObject_Length(obj)objects.LengthA7 C-type table (Py_ssize_t)
BUILD_STRINGgenerated44eff3d_PyUnicode_JoinArray over the stackref sliceA8 STACKREFS_TO_PYOBJECTS macro
FORMAT_SIMPLEgenerated44eff3dif (!PyUnicode_CheckExact(value)) ...objects.Str fallbackA3 if-statement parser
COPYgenerated90c4fceoutput = PyStackRef_DUP(bottom) over a sized-input regionA1 sized-input flip
SWAPgenerated90c4fceswap top with top[1 - oparg] over a sized-input regionA1 sized-input flip
SET_UPDATEgeneratedab365faint err = _PySet_Update(set, iterable)A1 sized-input flip
DICT_UPDATEgeneratedab365faint err = PyDict_Update(dict, mapping)A1 sized-input flip
LOAD_COMMON_CONSTANTgenerated4ac50b5PyStackRef_FromPyObjectImmortal(tstate->interp->common_consts[oparg])A7 C-type table (tstate); also drops dead END_SEND arm
POP_EXCEPTgeneratedee55ed4_PyErr_StackItem swap into tstate->exc_infoA7 C-type table (_PyErr_StackItem); routes through setHandledException
PUSH_EXC_INFOgeneratedee55ed4mirrors POP_EXCEPT in the opposite directionA7 C-type table
STORE_GLOBALgeneratedd6ad44bGETITEM(FRAME_CO_NAMES, oparg) + PyDict_SetItem(GLOBALS(), name, v)A4 GETITEM helper
DELETE_GLOBALgeneratedd6ad44bGETITEM(FRAME_CO_NAMES, oparg) + PyDict_DelItem(GLOBALS(), name)A4 GETITEM helper
FORMAT_WITH_SPECgenerated67735f0PyObject_Format(value, format_spec)objects.FormatBucket B helper
GET_ITERgenerated02e72c3PyObject_GetIter(iterable)e.objectGetIterBucket B helper; routes errors via e.pendingErr
BUILD_LISTgeneratedb0819a5_PyList_FromStackRefStealOnSuccess(values, oparg)e.listFromStackRefsized-input peek + drop; bottom-first order matches handwritten pop-in-reverse
BUILD_TUPLEgeneratedb0819a5_PyTuple_FromStackRefStealOnSuccess(values, oparg)e.tupleFromStackRefsized-input peek + drop
BUILD_SLICEgeneratedb0819a5PySlice_New(start, stop, step)e.sliceNew over args[0..2]step is nil when oparg==2
BUILD_MAPgenerated60c7912_PyDict_FromItems(values_o, 2, values_o+1, 2, oparg)e.dictFromItemsbottom-first key/value pairs match handwritten reverse-pop order
BUILD_TEMPLATEgenerated60c7912_PyTemplate_Build(strings, interpolations)e.templateBuildt-string runtime; helper is a thin objects.NewTemplateStr wrapper
GET_AWAITABLEgenerated419072c_PyEval_GetAwaitable(iter, opcode)e.getAwaitablehelper already wired; flipped after auditing handwritten arm against generated body
GET_ANEXTgenerated419072c_PyEval_GetANext(aiter)e.getANextasync-gen __anext__ wrapper; flipped after audit

Porting backlog (organized by blocker)

Every inst() body in CPython 3.14.5 Python/bytecodes.c that does not yet route through dispatchGen lives in one of the buckets below. Each bucket corresponds to a single unblocker (a translator extension, a Go helper port, or an emitter rewrite). Landing one unblocker should flip the whole bucket in one step, with the per-opcode work being audit-and-whitelist rather than ad-hoc translation. This replaces the previous staging/legacy split: opcodes are no longer ordered by which Go panel happens to host their handwritten arm, they are ordered by what we have to build to retire them.

Counts come straight from the bytecodes.c coverage gauge (TestCPythonBytecodesCoverage). The opcode list under each row is verbatim from the bail histogram; keep it in sync when the gauge histogram changes.

Bucket A — translator extensions

These flip 50+ opcodes between them. They are the highest leverage work in Phase 5 and should land before any helper-port campaign starts.

BucketCountOpcodesUnblock task
A1. Sized inputs/outputs (unused[oparg-1], values[oparg], etc.)11COPY, SWAP, DICT_MERGE, DICT_UPDATE, LIST_APPEND, LIST_EXTEND, MAP_ADD, RERAISE, SET_ADD, SET_UPDATE, UNPACK_EXTeach tier1_arm.tmpl to peek (not auto-pop) sized regions and to declare passthrough outputs without shadowing the input. Reference: optimizer _COPY / _SWAP in optimizer/uops_impl.go:152.
A2. int C local (int flag = ...;)14BUILD_INTERPOLATION, CALL_ISINSTANCE, CHECK_EG_MATCH, CLEANUP_THROW, DELETE_SUBSCR, INSTRUMENTED_INSTRUCTION, INSTRUMENTED_LINE, INSTRUMENTED_POP_JUMP_IF_{TRUE,FALSE,NONE,NOT_NONE}, IS_OP, MATCH_MAPPING, MATCH_SEQUENCEAdd case "int": to the C-local statement walker in action.go, emit name := <int-expr> (Go bool for 0/1 flags where the only use site is a conditional).
A3. if / else statement8DELETE_FAST, EXIT_INIT_CHECK, FORMAT_SIMPLE, GET_YIELD_FROM_ITER, INSTRUMENTED_END_FOR, INSTRUMENTED_END_SEND, LOAD_FAST_CHECK, TO_BOOL_INTAdd an if (cond) { ... } [else { ... }] statement parser to the body walker. Existing expression parser already handles the test; only the statement-level shape is missing.
A4. GETITEM(consts/names, oparg)14DELETE_ATTR, DELETE_GLOBAL, DELETE_NAME, IMPORT_FROM, IMPORT_NAME, LOAD_CONST, LOAD_CONST_IMMORTAL, LOAD_CONST_MORTAL, LOAD_FROM_DICT_OR_GLOBALS, LOAD_NAME, LOAD_SUPER_ATTR_ATTR, LOAD_SUPER_ATTR_METHOD, STORE_GLOBAL, STORE_NAMERecognise the GETITEM(FRAME_CO_CONSTS, oparg) / GETITEM(FRAME_CO_NAMES, oparg) idioms and emit e.constAt(int(oparg)) / e.nameAt(int(oparg)). Removes the long-standing LOAD_CONST.wrapConst handwritten arm.
A5. uint32_t / paired-fast locals4LOAD_FAST_LOAD_FAST, LOAD_FAST_BORROW_LOAD_FAST_BORROW, STORE_FAST_LOAD_FAST, STORE_FAST_STORE_FASTExtend the C-local walker to uint32_t and add the high/low oparg decode helper (uint32_t loparg = oparg & 0xF; uint32_t hiparg = oparg >> 4;).
A6. PyObject *x = <call-returning-Object>4LOAD_FROM_DICT_OR_DEREF, LOAD_BUILD_CLASS, SETUP_ANNOTATIONS, WITH_EXCEPT_STARTThe existing PyObject * walker only accepts PyStackRef_AsPyObject{Borrow,Steal}(...) on the RHS. Generalise it to accept any expression that resolves to objects.Object.
A7. C-typed declarations to keep as Go locals13LOAD_ATTR_GETATTRIBUTE_OVERRIDDEN (PyTypeObject), GET_LEN (Py_ssize_t), MAKE_FUNCTION + RETURN_GENERATOR (PyFunctionObject), COPY_FREE_VARS + ENTER_EXECUTOR (PyCodeObject), LOAD_DEREF + STORE_DEREF (PyCellObject), POP_EXCEPT + PUSH_EXC_INFO (_PyErr_StackItem), CACHE + RESERVED (Py_FatalError stub), SET_FUNCTION_ATTRIBUTE (size_t), CONVERT_VALUE (conversion_func), YIELD_VALUE (frame), INTERPRETER_EXIT (tstate), EXTENDED_ARG (opcode), GET_AITER (unaryfunc), CALL_LIST_APPEND (PyInterpreterState), LOAD_COMMON_CONSTANT + RESUME_CHECK (tstate / _Py_emscripten_signal_clock)Per-type table of C-type → Go-type mappings (e.g. Py_ssize_t → int, size_t → int, PyCellObject → *objects.Cell). Each entry takes <10 lines once the table is in place.
A8. Misc parser rough edges9LOAD_SMALL_INT (literal sign-extend cast), RAISE_VARARGS + BUILD_SLICE (ternary ?: in call args), CHECK_EXC_MATCH (output name b written from PyStackRef_True/False), INSTRUMENTED_FOR_ITER (output redeclares input name), LOAD_LOCALS (LOCALS() macro), DELETE_DEREF (PyCell_SwapTakeRef), STACKREFS_TO_PYOBJECTS macro (BUILD_MAP, BUILD_STRING), INSTRUMENTED_JUMP family (3 — INSTRUMENTED_JUMP_FORWARD, INSTRUMENTED_NOT_TAKEN, INSTRUMENTED_POP_ITER)One small parser fix each; group into one commit. The ternary + STACKREFS_TO_PYOBJECTS items are real translator features; the rest are one-line typos in the action walker.
Bucket B — Go helper ports

After Bucket A lands, every remaining bail reduces to "the translator wants to emit a call but the Go target does not exist yet". The action translator already knows how to render the call shape (<callee>(args...)); we just need the callee to compile. Each helper below is a single CPython function we port to gopy's objects/ (or module/) package. Once the Go helper exists, the translator emits the call verbatim and the opcode flips.

Each row carries a Status (DONE once the opcode is in the dispatchGenSupported whitelist; TODO while the helper does not exist or the opcode is still routed through the handwritten panel) and a Commit stamp (the commit that flipped the opcode through dispatchGen). Flip rows in step with the Phase 5.2 audit table above.

Helper (CPython → gopy)Opcodes unblockedStatusCommit
PyNumber_Negativeobjects.NumberNegativeUNARY_NEGATIVEDONEe2c5275
PyNumber_Invertobjects.NumberInvertUNARY_INVERTDONEe2c5275
PyObject_Formatobjects.FormatFORMAT_WITH_SPECDONE67735f0
PyObject_GetIterobjects.GetIter (already exists; just wire the _Py_GatherStats_GetIter instrumentation stub)GET_ITERDONE02e72c3
PySet_Newobjects.NewSet([]Object)BUILD_SETTODO-
PyCell_Newobjects.NewCellMAKE_CELLTODO-
_PyList_FromStackRefStealOnSuccess → wrapper over objects.NewListBUILD_LISTDONEb0819a5
_PyTuple_FromStackRefStealOnSuccess → wrapper over objects.NewTupleBUILD_TUPLEDONEb0819a5
_PyTemplate_Buildobjects.BuildTemplate (t-string runtime)BUILD_TEMPLATEDONE60c7912
_PyEval_GetAwaitableobjects.GetAwaitableGET_AWAITABLEDONE419072c
_PyEval_GetANextobjects.GetANextGET_ANEXTDONE419072c
_PyEval_MatchClassobjects.MatchClassMATCH_CLASSTODO-
_PyEval_MatchKeysobjects.MatchKeysMATCH_KEYSTODO-
_PyIntrinsics_UnaryFunctions / _PyIntrinsics_BinaryFunctions tables → vm/intrinsics.go lookupCALL_INTRINSIC_1, CALL_INTRINSIC_2TODO-
PyStackRef_MakeHeapSafestackref.MakeHeapSafe (escapes after a yield/return; trivial under Go GC)RETURN_VALUETODO-
LOCALS()e.frame.Locals() (combined with A8 misc)LOAD_LOCALSDONEa7a4f7f
PyCell_SwapTakeRefobjects.Cell.SwapTakeRef (combined with A8)DELETE_DEREFTODO-

After Bucket A + B, the only opcodes still routed through trySimple / tryImport / tryGen / tryMatch are the ones whose CPython bodies are structurally divergent from gopy (LOAD_CONST's wrapConst once we delete it, FOR_ITER / SEND / specialized CALL family with cache-driven control flow). Those are Phase 6 work; they stay in their handwritten panels until then.

Bucket C — structurally divergent (Phase 6 work, listed here for completeness)

These bodies will not flip in Phase 5 because the CPython shape disagrees with the gopy runtime in load-bearing ways. They are tracked here so a future audit doesn't try to "fix" them by extending the translator.

OpcodeDivergence
FOR_ITER familyCache-driven deopt + specialized fast paths; needs the Phase 6 specialized-arm harness.
SEND / SEND_GENGenerator-frame swap that the gopy generator runtime does differently.
CALL / CALL_KW / CALL_FUNCTION_EX familygopy's call sites already share a single helper; the CPython body would force a re-split. Wait for the specialized harness.
RESUME (and friends)Eval breaker poll has gopy-specific signal hooks.
JUMP_BACKWARD (non-NO_INTERRUPT)Same — breaker poll.
BINARY_OPgopy resolves the op via slot dispatch, not via a helper-table indexed by oparg. Wait until slot-dispatch lands on the generated side.
Sequenced plan

The buckets above are independent enough to parallelise across sessions, but the dependency order is:

  1. A1 (sized I/O emitter rewrite) first. Without it, no sized opcode can flip and the translator continues to grow ad-hoc workarounds. ~11 opcodes flip on landing.
  2. A4 (GETITEM) + A6 (general PyObject * RHS) together. They unblock the LOAD_/STORE_/DELETE_NAME/GLOBAL/CONST cluster and the LOAD_BUILD_CLASS group. ~18 opcodes flip.
  3. A2 (int) + A3 (if) together. They share a parser surface (statement-level recognition + bool/int locals) and land ~22 opcodes.
  4. A5 (paired-fast) + A7 (C-type table) + A8 (misc). Cleanup pass. ~26 opcodes.
  5. Bucket B — schedule each helper port as its own task. They are now linear: each unblocks one (sometimes two) opcodes.

After step 4, parser coverage in the bytecodes.c gauge should be ≥80 / 118 (current floor 14). After bucket B it should be ≥95; the rest are bucket C and stay on the Phase 6 list.

Tracking
  • Tools/bytecodes_gen/cpython_coverage_test.go is the gauge; const minTranslates is the floor and bumps as each bucket lands.
  • vm/dispatch_gen_whitelist.go is the production gate; an opcode lands here once its generated body is bit-equivalent to the previous handwritten arm.
  • vm/eval_dispatch_handwritten.go shrinks one entry at a time as opcodes graduate. When empty, Phase 5.3 closes.
  • vm/eval_simple.go / vm/eval_import.go / vm/eval_gen.go / vm/eval_match.go shrink in step. When all four are gone, Phase 5.4 closes.

Phase 6 — specialized arms

Same harness; the specialized switch arms live in vm/eval_dispatch_gen.go alongside the parent's case. The body functions for specialized opcodes have the same calling convention as Phase 5; the harness handles cache decode, the deopt branch (returns a flag the harness interprets as "fall through to the parent's body"), and the cache advance.

The LOAD_GLOBAL regression test lands here: a fixture that specializes LOAD_GLOBAL, then introspects the generated bytecode to assert the next instruction's first byte is unchanged. The same fixture catches every future opcode whose cache crosses the boundary into the next instruction.

StepStatusCommit
Specialized cases emitted in vm/eval_dispatch_gen.goTODO-
vm/eval_specialized_*.go shrinks to bodies onlyTODO-
LOAD_GLOBAL cache-boundary regression testTODO-
Per-family boundary tests for the other ~10 specialized familiesTODO-

Phase 7 — tier-2 uops

Same harness pattern against Python/optimizer_bytecodes.c for the override side and the same bytecodes.c for the shared op() bodies. The win is twofold:

  1. Tier-1 LOAD_FAST and tier-2 _LOAD_FAST route through the same opLOAD_FAST body function. They cannot disagree.
  2. The remaining ~270 uops 1712 has not hand-ported land for free: the generator emits their bodies (or, in the body-translation phases, generates them entirely).

A perf check runs the 1712 P2 microbenchmark suite before/after the cutover. The acceptance band is ±2%; a wider gap means the generated dispatch path is missing an inlining opportunity that the hand-rolled version had, and that's a generator fix.

StepStatusCommit
gopy_tier2_generator.py landsTODO-
vm/eval_uops_gen.go covers every uop currently hand-rolledTODO-
Shared-body parity test (tier-1 LOAD_FAST ≡ tier-2 _LOAD_FAST)TODO-
Remaining ~270 uops emitted; tier-2 trace coverage on micro-bench corpus jumpsTODO-
1712 microbench ±2% before/afterTODO-

Phase 8 — body translation pilot

CPython's opcode bodies are C, but they use a tightly constrained subset of C. A first pass through every body in bytecodes.c shows the subset is:

  • Local variable declarations + assignments
  • if / else if / else
  • while (rare; mostly for stack juggling helpers)
  • switch (extremely rare; mostly oparg-dispatch in CALL family)
  • Calls to a fixed set of runtime helpers (PyObject_GetAttr, _PyLong_Add, etc) each of which gopy has a Go-side equivalent
  • The DSL-specific macros listed in Phase 1
  • goto error / goto exit_unwind (handled by harness)

What does not appear: pointer arithmetic, manual struct casts, #ifdef (a few of these exist, gated on debug builds; we ignore them), inline assembly, setjmp/longjmp.

The pilot translator handles the subset above for 10 opcodes:

OpcodeWhy this one
NOPEmpty body. Smoke test.
POP_TOPOne-line stack effect.
LOAD_FASTLocal-variable read.
LOAD_FAST_BORROWSame but borrow-flavored stack ref.
STORE_FASTLocal-variable write.
LOAD_CONSTConstant table read.
RETURN_VALUEFrame exit; tests the harness's exit path.
RESUME_CHECKTests DEOPT_IF translation.
END_FORTests stack-shrink + jump.
POP_TOP_LOAD_CONST_INLINE_BORROWTests macro() composition.

Each translated body lands as a generator output; the hand-written Go body is deleted. CI must stay green at the end of each opcode's migration.

The subset is documented in tools/cases_generator/SUBSET.md. Anything outside it is a "keep hand-rolled" escape hatch with a single-line justification.

StepStatusCommit
tools/cases_generator/body_translator.py covers the Phase 8 subsetTODO-
10 opcode bodies translated; hand-written deleted; CI green per-opcodeTODO-
SUBSET.md describes covered + uncovered constructsTODO-

Phase 9 — body translation scale-up

Run the translator against every opcode. Expectations after Phase 9:

  • ~270 of ~285 uops fully generated (body + harness).
  • ~110 of ~140 tier-1 opcodes fully generated.
  • ~30 opcodes flagged as "manual body required" with a comment citing the construct (typically: a CPython helper gopy spells differently, or a CPython-specific refcount idiom).
  • Reproducibility gate green: tools/regen-cases.sh produces no diff against the committed *_gen.go files.
StepStatusCommit
All remaining bodies translated or explicitly opted outTODO-
Hand-written body count below 30 with per-opcode justificationTODO-
Reproducibility gate tools/cases_generator_reproducibility_test.go green in CITODO-

Risks and carve-outs

Body translation is the hard part. Phase 8 is the de-risking step: if the translator subset turns out to be too narrow, the spec lands at Phase 7 (harness only, hand-written bodies) and Phase 8/9 become a follow-on spec. Phase 7 still removes the entire dispatch / cache-layout / family / deopt class of bugs; that is the load-bearing win.

CPython version pinning. The generator runs against one CPython version's bytecodes.c. Bumping CPython (spec 1707) becomes a more involved process: vendor the new inputs, run the generator, fix the diffs in any opcode bodies the translator flagged as manual. The win is that the diff is now mechanical: look at the DSL declarations, look at the generated harness diff, fix the bodies. Today's CPython bumps require manually walking every opcode for cache-layout and stack-effect changes; this is much worse.

Python at build time. The generator is Python. Builds need a working python3.14. CI already has one (spec 1700 / regrtest gate); developer machines need it. We document it in tools/cases_generator/README.md. The generator runs only on explicit tools/regen-cases.sh invocation; ordinary go build does not invoke it. Generated files are checked in.

Two-source-of-truth windows. During Phases 2-6 each emitter ships with a parity test against the hand-rolled file it replaces. The hand-rolled file is deleted only when the parity test is green. There is no window where two files claim to own the same table; the parity test mediates the switchover.

Debuggability of generated Go. Generated dispatch code is harder to step through than hand-written Go. Mitigation: //line directives in the generator output point each emitted line back to Python/bytecodes.c, so a panic stack trace lands the developer at the DSL declaration, not the generated Go. This mirrors what CPython does for generated_cases.c.h against bytecodes.c.

Performance regressions in dispatch. Generated Go may pessimize inlining vs the hand-written switch. The Phase 7 microbench check catches this; if it fails, fix is generator-side (emit different shapes) not body-side. The hand-written baseline is preserved in git for A/B comparison until Phase 9 closes.

Deprecated source files

Every file below carries a // DEPRECATED (spec 1714): ... banner the day this spec lands. The banner names the phase that deletes the file and the generated file that replaces it. Editing a deprecated file is strongly discouraged once the replacing phase is in flight: any change there has to be reflected in the generator output too, and the deletion sweep at end-of-phase rolls back manual edits anyway.

A "fully deleted" entry means the file disappears from the tree. A "shrinks to" entry means the file survives but its hand-rolled sections (cache writes, family literals, dispatch switches) are removed; what remains is glue too small to be worth generating (typically: a specializer policy function or a frame-setup helper).

FileStatus todayReplaced byPhaseDisposition
compile/opcodes_gen.goHand-curated despite the _gen suffixcompile/opcode_ids_gen.go2Fully deleted
compile/opcode_caches.goHand-rolled cache-size tablecompile/opcode_metadata_gen.go2Fully deleted
specialize/cache.goSetCacheCell / CacheCell raw codeunit accessspecialize/cache_layouts_gen.go typed accessors3Fully deleted (no callers post-migration)
specialize/quicken.goHand-rolled family literalspecialize/family_gen.go4Shrinks to policy helpers
specialize/deopt.goHand-rolled deopt mapspecialize/family_gen.go4Shrinks to deopt-action helpers
specialize/binary_op.goHand-rolled specializer; raw cache writestyped cache access (Phase 3) + generated metadata (Phase 4)3+4Shrinks to specialize policy
specialize/call.goSameSame3+4Shrinks to specialize policy
specialize/call_kw.goSameSame3+4Shrinks to specialize policy
specialize/compare_op.goSameSame3+4Shrinks to specialize policy
specialize/contains_op.goSameSame3+4Shrinks to specialize policy
specialize/for_iter.goSameSame3+4Shrinks to specialize policy
specialize/load_attr.goSameSame3+4Shrinks to specialize policy
specialize/load_global.goSameSame3+4Shrinks to specialize policy
specialize/load_super_attr.goSameSame3+4Shrinks to specialize policy
specialize/send.goSameSame3+4Shrinks to specialize policy
specialize/store_attr.goSameSame3+4Shrinks to specialize policy
specialize/store_subscr.goSameSame3+4Shrinks to specialize policy
specialize/to_bool.goSameSame3+4Shrinks to specialize policy
specialize/unpack_sequence.goSameSame3+4Shrinks to specialize policy
vm/eval_simple.goHand-rolled tier-1 dispatch switchvm/eval_dispatch_gen.go5Shrinks to evalLoop scaffolding
vm/eval_call.goHand-rolled CALL/CALL_KW family bodiesvm/eval_dispatch_gen.go + per-opcode op<NAME> bodies5+6Shrinks to body helpers
vm/eval_resume.goHand-rolled RESUME / RESUME_CHECKgenerator output5+6Shrinks to body helpers
vm/eval_match.goHand-rolled MATCH_* familygenerator output5Shrinks to body helpers
vm/eval_import.goHand-rolled IMPORT_NAME / IMPORT_FROMgenerator output5Shrinks to body helpers
vm/eval_unwind.goHand-rolled error-path label dispatchgenerator output (error labels)5Shrinks to error helpers
vm/eval_gen.goHand-rolled generator-related opcode bodies (SEND, YIELD_VALUE, etc)generator output5+6Shrinks to body helpers
vm/eval_specialized.goHand-rolled dispatch for specialized armsvm/eval_dispatch_gen.go6Fully deleted
vm/eval_specialized_binary_op.goHand-rolled BINARY_OP_* armsgenerator + per-opcode bodies6Shrinks to body helpers
vm/eval_specialized_compare.goHand-rolled COMPARE_OP_* armsgenerator + per-opcode bodies6Shrinks to body helpers
vm/eval_specialized_contains.goHand-rolled CONTAINS_OP_* armsgenerator + per-opcode bodies6Shrinks to body helpers
vm/eval_specialized_load_global.goHand-rolled LOAD_GLOBAL_* armsgenerator + per-opcode bodies6Shrinks to body helpers
vm/eval_specialized_store_attr.goHand-rolled STORE_ATTR_* armsgenerator + per-opcode bodies6Shrinks to body helpers
vm/eval_specialized_store_subscr.goHand-rolled STORE_SUBSCR_* armsgenerator + per-opcode bodies6Shrinks to body helpers
vm/eval_specialized_tobool.goHand-rolled TO_BOOL_* armsgenerator + per-opcode bodies6Shrinks to body helpers
vm/eval_specialized_unpack.goHand-rolled UNPACK_SEQUENCE_* armsgenerator + per-opcode bodies6Shrinks to body helpers
vm/tier2.goHand-rolled tier-2 trace dispatchervm/eval_uops_gen.go7Shrinks to trace-loop scaffolding
optimizer/uops.goHand-rolled uop definitionsoptimizer/uop_meta_gen.go (regenerated)7Fully deleted
optimizer/uops_impl.goHand-rolled uop body implementationsvm/eval_uops_gen.go7+8+9Fully deleted
optimizer/uops_dispatch_gen.goGenerated by a hand-rolled scriptvm/eval_uops_gen.go7Fully deleted
optimizer/uops_stubs_gen.goGenerated by a hand-rolled scriptgenerator output7Fully deleted
optimizer/uop_ids_gen.goGenerated by a hand-rolled scriptoptimizer/uop_ids_gen.go (regenerated through cases_generator)7File survives, content fully regenerated
optimizer/uop_meta_gen.goSameSame7File survives, content fully regenerated
optimizer/analysis.goHand-rolled abstract interpreter casescompile/optimizer_cases_gen.go7Shrinks to analysis-driver scaffolding

Out of scope

  • Computed-goto dispatch (target_generator.py analogue). Go does not support computed goto. The dispatch switch is the best we can do; CPython's opcode_targets.h has no gopy analogue. This is a documented carve-out, not a TODO.
  • Python-side metadata (py_metadata_generator.pyLib/_opcode_metadata.py). gopy already vendors that file via 1710 T5.1. We ship CPython's directly; no regeneration.
  • PEP 659 specializer skeletons. Each specialize/<family>.go has a hand-written specialize/unspecialize policy that decides when to upgrade an opcode. The decision logic stays hand-written; only the cache layout and dispatch harness become generated. CPython's generator does not emit the policy either; it emits the plumbing the policy uses.

Checklist

  • Phase 0.1 — vendor Tools/cases_generator/ under Tools/cases_generator/
  • Phase 0.2 — mirror bytecodes.c, optimizer_bytecodes.c, pycore_code.h under Tools/cases_generator/inputs/
  • Phase 0.3 — Tools/regen-cases/ (Go driver) invokes upstream generators into a scratch dir
  • Phase 0.4 — go run ./Tools/regen-cases --check-upstream diff-clean vs CPython 3.14.5
  • Phase 0.5 — CI job cases-generator-upstream-parity green
  • Phase 1.1 — gowriter.py mirrors cwriter.py API for Go output
  • Phase 1.2 — go_generators_common.py binds the constant DSL macros to Go
  • Phase 1.3 — 30-snippet golden corpus under Tools/cases_generator/testdata/snippets/ (20/30 landed; remaining 10 stage with Phase 5 op signatures)
  • Phase 1.4 — TestSnippetParity green
  • Phase 2.1 — gopy_opcode_id_generator.py emits compile/opcode_ids_gen.go
  • Phase 2.2 — gopy_opcode_metadata_generator.py emits compile/opcode_metadata_gen.go
  • Phase 2.3 — parity test vs compile/opcodes_gen.go + compile/opcode_caches.go
  • Phase 2.4 — delete compile/opcode_caches.go; redirect references
  • Phase 3.1 — cache_struct_parser.py parses pycore_code.h struct definitions
  • Phase 3.2 — specialize/cache_layouts_gen.go covers every _Py<Op>Cache
  • Phase 3.3 — migrate specialize/load_global.go + vm/eval_specialized_load_global.go to typed accessors
  • Phase 3.4 — migrate every other specialize/*.go + vm/eval_specialized_*.go
  • Phase 3.5 — delete specialize.SetCacheCell / CacheCell (no callers)
  • Phase 3.6 — TestCacheLayoutSize green
  • Phase 4.1 — generator emits specialize/family_gen.go (family + deopt tables)
  • Phase 4.2 — specialize/quicken.go + specialize/deopt.go consume the generated tables
  • Phase 4.3 — parity test green; literal tables deleted
  • Phase 5.1 — tier-1 emitter (Go-side Tools/bytecodes_gen in lieu of gopy_tier1_generator.py) emits vm/eval_dispatch_gen.go for unspecialized opcodes (107 arms, bodies stubbed pending Phase 8 action translator)
  • Phase 5.2 — every opcode body in vm/eval_simple.go migrated to a typed op<NAME> function (43 / ~118 opcodes routed through dispatchGen via the dispatchGenSupported whitelist; see the Phase 5.2 audit table for the per-opcode commit stamp)
  • Phase 5 Bucket A6.1 — _Py_ID(NAME) translates to objects.NewStr("NAME")
  • Phase 5 Bucket A6.2 — out-param int err = HELPER(args..., &out) translates to Go multi-return
  • Phase 5 Bucket A6.3 — _PyErr_SetString carries the literal message through setPendingErr
  • Phase 5 Bucket B1 — PyMapping_GetOptionalItemobjects.MappingGetOptionalItem; flips LOAD_BUILD_CLASS (55440dc)
  • Phase 5 Bucket B2 — PyDict_New registered as expression helper; EvalCode defaults f.Locals = globals so module-frame LOCALS() matches CPython; flips SETUP_ANNOTATIONS (07aa060)
  • Phase 5 Bucket B3 — PyNumber_Negative / PyNumber_Invert helpers; flips UNARY_NEGATIVE, UNARY_INVERT (e2c5275)
  • Phase 5 Bucket B4 — PyObject_Format helper; flips FORMAT_WITH_SPEC (67735f0)
  • Phase 5 Bucket B5 — LOCALS()e.frame.Locals(); flips LOAD_LOCALS (a7a4f7f)
  • Phase 5 Bucket B6 — PyObject_GetIter already wired through e.objectGetIter; flips GET_ITER (02e72c3)
  • Phase 5 Bucket B7 — _PyList_FromStackRefStealOnSuccess / _PyTuple_FromStackRefStealOnSuccess / PySlice_New already wired (listFromStackRef / tupleFromStackRef / sliceNew); flips BUILD_LIST, BUILD_TUPLE, BUILD_SLICE (b0819a5)
  • Phase 5 Bucket B8 — _PyDict_FromItems / _PyTemplate_Build already wired (dictFromItems / templateBuild); flips BUILD_MAP, BUILD_TEMPLATE (60c7912)
  • Phase 5 Bucket B9 — _PyEval_GetAwaitable / _PyEval_GetANext already wired (getAwaitable / getANext); flips GET_AWAITABLE, GET_ANEXT (419072c)
  • Phase 5.3 — vm/eval_simple.go shrinks to evalLoop scaffolding only
  • Phase 5.4 — go test ./vm green
  • Phase 6.1 — specialized cases emitted in vm/eval_dispatch_gen.go
  • Phase 6.2 — each vm/eval_specialized_*.go shrinks to body functions
  • Phase 6.3 — LOAD_GLOBAL cache-boundary regression test
  • Phase 6.4 — per-family boundary tests for ~10 other specialized families
  • Phase 7.1 — gopy_tier2_generator.py emits vm/eval_uops_gen.go
  • Phase 7.2 — shared-body parity test (tier-1 LOAD_FAST ≡ tier-2 _LOAD_FAST)
  • Phase 7.3 — remaining ~270 uops emitted
  • Phase 7.4 — 1712 microbench ±2% before/after
  • Phase 8.1 — body_translator.py covers the Phase 8 subset
  • Phase 8.2 — 10 pilot opcodes have zero hand-written Go body
  • Phase 8.3 — SUBSET.md lists covered + uncovered constructs
  • Phase 9.1 — every remaining body translated or explicitly opted out
  • Phase 9.2 — hand-written body count below 30 with per-opcode justification
  • Phase 9.3 — test/gate/cases_generator_reproducibility_test.go runs tools/regen-cases.sh && git diff --exit-code in CI
  • Spec 1714 final gate — generator reproducibility green; spec 1713 resumed