Performance

The short version. gopy is slower than CPython 3.14 on every benchmark we run. The geometric mean on the small benchmark set is in the low hundreds of times slower. The gap is closing, mostly via Tier-2 uop coverage and continued specialization work.

How to read this page

"Slower than CPython" means wall-time per benchmark, measured in process, single-threaded.
"Speedup" numbers are CPython time divided by gopy time. A speedup of 0.01 means gopy is 100x slower than CPython.
gopy uses the same algorithms as CPython by design. The difference is implementation overhead, not algorithmic.

Where the time goes

The eval loop dominates. gopy's switch-based dispatch is in Go; CPython's is in tuned C with computed gotos. Until the Tier-2 optimizer covers more uops, hot loops stay in the slow path.

Secondary costs, in rough order of impact:

Interface dispatch on every object operation. Go's interface calls are more expensive than CPython's slot table lookups.
Reference counting via atomic ops in some paths. CPython uses non-atomic counters in single-threaded code.
map[string]Object for dict storage instead of CPython's hand-tuned hash table with combined / split layouts.
Boxed integers and floats. Small-int caching helps but doesn't eliminate allocation.

None of these are fundamental. They are engineering work, listed roughly in the order we expect to attack them.

Running the benchmarks yourself

The repo ships a small benchmark harness:

make bench

This runs a curated set of Python programs under both gopy and CPython 3.14, then writes a comparison table to bench/results.md. Source for each benchmark lives in bench/programs/.

The benchmarks cover:

Microbenchmarks. Empty loop, attribute access, integer arithmetic, list comprehension, dict access.
Algorithmic. N-queens, Fannkuch, Sieve of Eratosthenes, binary trees, deltablue.
Stdlib-heavy. JSON encode/decode, regex match, string formatting.

What is improving

Tier-2 uop bodies. As more uops are hand-ported from Python/executor_cases.c.h, hot loops escape Tier-1 dispatch. Coverage is roughly 14 of 285 uops today.
Specializer tuning. PEP 659 is wired but the heuristics that pick specialised forms have not been retuned for gopy's cost model.
Dict layout. A combined / split hashtable replacing the map[string]Object is on the roadmap.

What will not change

No JIT compilation to native code. gopy is and will remain an interpreter (with Tier-2 traces, which stay in the interpreter).
No goroutine-level parallelism within one interpreter. PEP 684 (per-interpreter GIL) is on the roadmap, but the GIL stays for single-interpreter workloads.

When the gap matters and when it doesn't

The gap matters for CPU-bound numerical or string-crunching workloads. There gopy is the wrong tool. Use CPython, or rewrite the hot path in Go and call it from Python via a built-in module.

The gap rarely matters for:

Configuration evaluation. A few hundred microseconds of parse
- compile + run is invisible inside a request that already does database I/O.
DSL-style rule engines. The rules typically run a few hundred ops total; even 100x slowdown is still under a millisecond.
Code generation and templating. I/O dominates.

Reference

bench/. The benchmark harness.
Status -> Active work for what is being optimised right now.
The CPython 3.14 What's New for the baseline gopy targets.

How to read this page​

Where the time goes​

Running the benchmarks yourself​

What is improving​

What will not change​

When the gap matters and when it doesn't​

Reference​