Skip to main content

v0.12.3 - The io subsystem and deferred annotations

Released May 15, 2026.

v0.12.2 was the import-chain release. It shimmed less and ported more, but the deepest single dependency in that chain, the _io C extension, still carried a stack of partial ports. bytesio.c, stringio.c, fileio.c, bufferedio.c, iobase.c, and the 3,500-line textio.c had each been audited once and patched up to the surface area we needed for import unittest to work. Not enough to claim the subsystem.

v0.12.3 is the cleanup pass. Every file under Modules/_io/ is now a 1:1 Go port with a citation per function. The codec layer underneath TextIOWrapper is real, stateful, and snapshot-aware, which means tell and seek finally produce the 21-byte cookies CPython produces rather than int64 stand-ins that broke as soon as a utf-16 stream went mid-character.

Two other themes land alongside the io drop.

PEP 649 and PEP 749 deferred annotations. Class, function, and module bodies now compile __annotate__ functions the way CPython 3.14 does. Lib/annotationlib.py is vendored end to end and the lazy __annotations__ descriptor resolves through it. This was spec 1706, eight phases, all in this cut.

Object protocol full port, phases 2 through 8. v0.12.2 closed Phase 1 (Objects/object.c). v0.12.3 closes the rest: funcobject.c (classmethod, staticmethod, function), classobject.c (bound method), typeobject.c (type_new pipeline and inherit_slots), and the STORE_NAME / LOAD_NAME / DELETE_NAME opcodes in Python/ceval.c. Issue #544 (enum import), #543 (re import end to end), #542 (fnmatch delegate), and #510 (the re/_sre port) all close as a result.

A handful of CI and VM correctness fixes ride along: exceptions now carry a real __traceback__ chain that traceback.format_exc() can walk, the bytecode assembler emits a per-instruction location table that matches Python/assemble.c, and a 3.14.5 sync audit (spec 1707) pulls the upstream pin to the latest patch release.

Highlights

Three pieces of work define this release.

_io in full

Every file under Modules/_io/ ships as a 1:1 port. Citations live inline as // CPython: Modules/_io/foo.c:NNN function_name and the spec table at website/docs/specs/1700/1702_*.md tracks the coverage row by row.

FilePortNotes
bytesio.cmodule/io/bytesio.goBytesIO with the upstream buffer growth policy.
stringio.cmodule/io/stringio.goStringIO with universal-newline translation.
fileio.cmodule/io/fileio.goFileIO with raw read, readinto, write, truncate, isatty, seek, tell.
bufferedio.cmodule/io/bufferedio.goThe unified-slab buffered model: BufferedReader, BufferedWriter, BufferedRandom, BufferedRWPair. Read-to-write transition invalidates the read buffer the way CPython does.
iobase.cmodule/io/iobase.go_IOBase, _RawIOBase, _BufferedIOBase abstract bases, including the __del__ warning emission for unclosed files.
textio.cmodule/io/textiowrapper.go_TextIOBase, IncrementalNewlineDecoder, TextIOWrapper plus the codec / read-chunk / tell-seek / reconfigure internals from spec 1709.
_iomodule.cmodule/io/iomodule.goio.open dispatch, _UnsupportedOperation, BlockingIOError wiring.

The interesting one is textio.c. Spec 1709 broke its internals into four phases that landed back to back in PR #57:

  1. Stateful codec layer. IncrementalDecoder / IncrementalEncoder interfaces with utf-8 (carrying a 4-byte tail for partial sequences), ascii, latin-1, utf-16 and utf-32 with BOM sniffing and endianness encoded into dec_flags, and 8-bit charmap variants. GetState and SetState mirror CPython's tuple protocol so a decoder mid-stream can be snapshotted, serialized into a tell cookie, and restored from a later seek. CPython: Modules/_io/textio.c:912 _PyCodecInfo_GetIncrementalDecoder.

  2. Read pipeline. readChunk drives one read1 / decode cycle, snapshotting (buf pos, decoder buffer, dec_flags, newline pendingcr/seennl) before feeding bytes through. The bytes-to-chars ratio smooths as `b2cratio = 0.625 * ratio + 0.375

    • prev, matching the CPython adaptive sizing exactly. read, read1, and readlineall go throughdrainAll/drainN/drainLinehelpers that consume the snapshot-protected chunk stream. CPython:Modules/_io/textio.c:1853 _textiowrapper_read_chunk`.
  3. Tell / seek cookie. A real 21-byte cookie struct, packed little-endian via math/big.Int:

    FieldTypeMeaning
    start_posu64Underlying stream offset at chunk start.
    dec_flagsi32Encoder-specific snapshot (BOM state, endianness).
    bytes_to_feedi32Bytes to replay through the decoder.
    chars_to_skipi32Decoded chars to discard from the front.
    need_eofu8Whether to push final = true on the replay.

    TellCookie returns *big.Int; SeekCookie accepts one. The int64 Seek / Tell stay as convenience wrappers but the Python-level seek / tell route through objects.NewIntFromBig and BigInt(). CPython: Modules/_io/textio.c:2387 cookie_type.

  4. Reconfigure. _io_TextIOWrapper_reconfigure_impl (textio.c:1370) raises ValueError if you change codec or newline policy mid-stream while there is read-ahead or write-ahead pending. The new Go method enforces the same invariants and rebuilds the decoder, encoder, and newline decoder when the swap is legal.

The final gate is TestTextIOTellSeekRoundTripUTF16: open a utf-16 BytesIO, read "ab", call TellCookie, drain "cdef", call SeekCookie back to the saved position, re-read and assert "cdef". That sequence used to silently desync mid-stream because the old shim discarded the decoder state across the seek; it now matches CPython byte for byte.

Two smaller pieces of textio polish landed earlier in the cycle. PR #50 ported _textiowrapper_writeflush and the pending-bytes batching so write accumulates short writes into one underlying flush rather than one syscall per character. PR #55 gave every _io instance a real per-instance __dict__ and put __enter__ / __exit__ as descriptors on the type, which is what stdlib/contextlib.py was reaching for when with open(...) as f: used to silently bypass the context manager protocol.

PEP 649 / 749 deferred annotations (spec 1706)

CPython 3.14 reshaped how annotations work. The old model was __annotations__ = {'x': int}, evaluated eagerly when the class or function body ran. The new model is __annotate__ = lambda fmt: {'x': int}, evaluated lazily, with three formats (VALUE, FORWARDREF, STRING) the consumer picks at access time. This unblocks forward references, postponed evaluation, and the from __future__ import annotations retirement.

Spec 1706 ports the whole pipeline in eight phases. All eight ship in v0.12.3.

  • Phase 1. Python/symtable.c learns annotation blocks. ann_scope is tracked separately so a name referenced only in an annotation does not pollute the enclosing function's co_names.
  • Phase 2. Python/codegen.c codegen_annassign rewrites annotated-assignment statements to record-only form. The annotation expression compiles into the __annotate__ body rather than into the surrounding scope.
  • Phase 3. Python/codegen.c learns to build the __annotate__ function: a code object whose body is the deferred annotation expressions, with one parameter (the format selector) and one return value (the resolved annotations dict).
  • Phase 4. Python/codegen.c body hook plus the CO_FUTURE_ANNOTATIONS short-circuit. If the module has the retired future-flag, we still honor the old eager-string behavior for one more release.
  • Phase 5. Objects/typeobject.c gets the lazy __annotations__ getset that triggers __annotate__(VALUE) on first read and caches the result on the instance dict.
  • Phase 6. Objects/funcobject.c mirrors the type-level getset for function objects.
  • Phase 7. Objects/moduleobject.c mirrors it again for module objects.
  • Phase 8. Lib/annotationlib.py is vendored byte for byte. The ForwardRef, Format, get_annotations, call_evaluate_function, call_annotate_function surface lands as upstream-identical Python, including the t-string pipeline for STRING format.

Net effect:

from __future__ import annotations # no longer required

class Tree:
left: Tree # forward reference, no NameError
right: Tree
parent: Tree | None

# Eager access still works.
print(Tree.__annotations__)
# {'left': <class '__main__.Tree'>, 'right': ..., 'parent': ...}

# Lazy access through annotationlib works too.
import annotationlib
print(annotationlib.get_annotations(Tree, format=annotationlib.Format.STRING))
# {'left': 'Tree', 'right': 'Tree', 'parent': 'Tree | None'}

Object protocol full port, phases 2 through 8 (spec 1704)

v0.12.2 closed Phase 1 (the Objects/object.c method and getset tables). v0.12.3 closes the rest of the spec. The deliverable, as laid out in v0.12.2, is "every function in the C file has a Go counterpart with a citation". After this release we never have to come back to these files looking for a missing slot.

  • Phase 2. Objects/funcobject.c PyClassMethod_Type. Full port of cm_init, cm_descr_get, cm_repr, cm_traverse, the member list for __func__ and __wrapped__, the getset list for __isabstractmethod__, __dict__, __annotations__, __annotate__. __set_name__ forwarding lands here so classmethod decorating a method picks up the owning class's name correctly.
  • Phase 3. Objects/funcobject.c PyStaticMethod_Type. Mirror of Phase 2 plus sm_call so staticmethod(f)(x) works directly without bouncing through .func.
  • Phase 4. Objects/funcobject.c PyFunction_Type. Audit of the full func_* table and getset list. __qualname__, __defaults__, __kwdefaults__, __closure__, __module__, __globals__, __code__, __dict__, __doc__, __annotations__, __annotate__, __type_params__ all resolve through real descriptors. func.__type_params__ in particular was missing entirely.
  • Phase 5. Objects/classobject.c PyMethod_Type. Bound-method port including method_richcompare, method_hash, method_repr, the full getset table for __func__, __self__, __doc__, __name__, __module__, __qualname__. __func__ is now a getset rather than a member, which means subclasses that try to override the slot work the same way they do in CPython.
  • Phase 6. Objects/typeobject.c type_new pipeline. Every function in the type_new_* family: type_new_set_attrs, type_new_set_bases, type_new_set_names, type_new_init_subclass, type_new_alloc, type_new_impl. This is the codegen-side of class C(Base, metaclass=M, **kw): ... and getting it right means every metaclass-related corner case (overridden __init_subclass__, __set_name__ hooks, the __class_getitem__ propagation) follows CPython exactly.
  • Phase 7. Objects/typeobject.c inherit_slots. The audit. Every slot edge (tp_as_number, tp_as_sequence, tp_as_mapping, tp_as_async, tp_as_buffer) is walked and inherited through the same propagation rules CPython uses. The pre-existing code worked for the common cases but had stale logic on the rare slots; the audit replaces it wholesale.
  • Phase 8. Python/ceval.c STORE_NAME / LOAD_NAME / DELETE_NAME. The fast-path-vs-protocol split CPython does for dict subclasses used as the class namespace. Class bodies whose namespace was a dict subclass with a custom __setitem__ (think enum.EnumDict) used to silently skip the subclass hook. Now the path matches the CPython arrow exactly: dict gets the inline fast path, anything else gets PyObject_SetItem.

This phase set is what closes #544 (enum import). The enum.EnumDict class is a dict subclass with a __setitem__ that intercepts member assignment, and the broken STORE_NAME path was the last piece silently dropping its work.

What's new

The full breakdown, grouped by where it landed.

module/io subsystem (spec 1702 finalization)

The io subsystem entered v0.12.2 with the surface area unittest needed and a backlog of half-ported functions. v0.12.3 closes the backlog. The spec table at website/docs/specs/1700/1702_*.md was scrubbed of false-positive "done" flips and walked row by row.

  • bytesio.c -> module/io/bytesio.go (PR #28). Buffer growth policy follows Modules/_io/bytesio.c:_io_BytesIO_write_impl so capacity doubles on overflow with a 256-byte floor.
  • stringio.c -> module/io/stringio.go (PR #29). Universal newline translation through the IncrementalNewlineDecoder from textio.c, not a separate translator.
  • fileio.c -> module/io/fileio.go (PR #30). Raw FD-backed IO. The CPython port treats the platform blksize hint as an optimization signal only; we dropped the platform helpers because Go's os.File.Read already chooses a sane block size.
  • bufferedio.c -> module/io/bufferedio.go (PR #31). This is the big one. Rewritten on CPython's unified-slab buffer model: one allocation per stream, read_end / write_end / read_pos / raw_pos cursors, the read-to-write transition rule (a read that follows a write must seek the underlying raw back to raw_pos, and a write that follows a read invalidates the read buffer). Adds repr, iter protocol, context manager exits.
  • iobase.c -> module/io/iobase.go (PR #32). _IOBase, _RawIOBase, _BufferedIOBase abstract bases. __del__ emits a ResourceWarning when an unclosed stream is collected, matching the upstream warning text exactly.
  • textio.c -> module/io/textiowrapper.go (PR #34 and follow-ups #50, #55, #56, #57). The internals rework that spec 1709 covers.
  • _iomodule.c -> module/io/iomodule.go (PR #35). io.open dispatch, _UnsupportedOperation, BlockingIOError wiring, the open argument parser including closefd and opener.

The codec layer underneath TextIOWrapper got its own ports:

  • Real charmap codecs (PR #46). Modules/_codecsmodule.c charmap_encode and charmap_decode ported in full. codecs.make_encoding_map builds the inverse table the way CPython does.
  • utf-16, utf-32, cp1252, cp1250, cp1251, cp437, mac-roman (PR #44). Each carries a CPython reference vector in the test table so future codec edits cannot silently regress. utf-16 / utf-32 honor BE / LE variants and BOM sniffing; the 8-bit code pages use real lookup tables generated from Lib/encodings/cp*.py.

Object protocol (spec 1704)

Tracked in PR #26 as the phases-2-through-8 bundle.

  • objects/classmethod.go, objects/staticmethod.go, objects/function.go rebuilt against the upstream member / getset tables. The CPython names appear inline as comments so the grep target is "given func_get_qualname, where does it live in Go" rather than scrolling.
  • objects/method.go reworked. PyMethod_Type getset goes through objects.NewGetSet with the upstream getter / setter shape.
  • objects/type_new.go is new. The CPython type_new pipeline was previously distributed across half a dozen NewUserType call sites; the rewrite collapses them into a single ordered pass that mirrors Objects/typeobject.c:3300 type_new_impl.
  • objects/inherit_slots.go is new. Walks every slot edge once at class-creation time and inherits through the same propagation rules CPython uses.
  • vm/store_name.go, vm/load_name.go, vm/delete_name.go refactored to take the same dict-fast-path-vs-protocol-call split CPython's ceval does.

PEP 649 / 749 (spec 1706)

Tracked across commits ef648d2, 4c295f2, 5bb581a, 2d312cf, 78240a3, fa00aa8. Eight phases inline.

  • compile/symtable.go learns annotation scopes.
  • compile/codegen.go rewrites annotated-assignment statements into __annotate__ body emission.
  • objects/type_annotate.go, objects/function_annotate.go, objects/module_annotate.go provide the lazy __annotations__ getsets.
  • stdlib/annotationlib.py vendored byte for byte.

The t-string pipeline (PEP 750) rides along because annotationlib.Format.STRING reuses it. Lazy annotations that contain forward references format as t-strings and resolve at access time the way CPython does.

VM and compile

  • exc.__traceback__ carries a real frame chain (PR #52). Every raised exception now has a populated __traceback__ from the moment it leaves the raising frame. traceback.format_exc() walks the chain and produces a multi-frame render that matches CPython byte for byte. The earlier behavior was a single-frame stub.
  • Bare Go errors get a traceback too (commit 921343a). When the VM surfaces a Go error that bubbled up from a builtin (the classic 1 / 0 case), the raise path now synthesizes a frame stack from the current evaluation frame so the user sees the full call chain rather than a one-line ZeroDivisionError with no context.
  • Per-instruction location table (spec 1708, PR #53). Python/assemble.c emit_location_info ported in full. Every bytecode instruction carries its source (line, end_line, col, end_col) quadruple in the same wire format CPython uses, which means code.co_positions() returns identical iterators to upstream and dis.dis highlights the same source ranges.
  • CodeType binding in lift helpers (PR #51). Builtin code objects now expose every co_* attribute: co_argcount, co_posonlyargcount, co_kwonlyargcount, co_nlocals, co_stacksize, co_flags, co_code, co_consts, co_names, co_varnames, co_freevars, co_cellvars, co_filename, co_name, co_qualname, co_firstlineno, co_linetable, co_exceptiontable. Reflection tools (inspect.signature, dis.code_info) work against any code object now, not just the ones we happened to bind by hand.
  • traceback module frame renders as <module> (PR #54). The outermost frame in a traceback used to render with the source filename in the name slot; it now renders as <module> to match CPython. print(file=...) against a stdlib File instance no longer raises spurious type errors.
  • SEND / END_SEND stack discipline (commit e767443). Generators ran with a one-slot stack imbalance because the END_SEND opcode was popping the value it should have left for the consuming frame. Fixed; generator goroutines now leave the exact stack shape CPython's _PyEval_EvalFrameDefault does.
  • sys.exception() (commit d801664). The Python 3.11+ except-block introspection helper. Returns the currently-handled exception or None, reading the same handled-exception slot sys.exc_info() does.
  • IMPORT_NAME builtins propagation (commit d797ab1). Imported modules inherit the importing frame's __builtins__, which means a module imported from a sandboxed builtins namespace stays sandboxed. The old code reached for the global __builtins__ unconditionally.
  • tuple.__mul__ (commit 816a321). The sq_repeat slot was missing for tuple, so (1, 2) * 3 raised TypeError instead of returning (1, 2, 1, 2, 1, 2). Surfaced by the argparse vendor.

Modules and ports

  • _socket Windows port (PR #43). The POSIX-only socket entry points (fileno-based descriptor passing, socketpair) are gated behind //go:build !windows, and a _socket_windows.go file publishes the same public surface backed by golang.org/x/sys/windows. The Windows test lane is now green for _socket.
  • Vendor Lib/socket.py (PR #48). The Python-level socket wrapper drops to upstream verbatim. The bogus _socket.makefile stub goes; the real socket.SocketIO class handles makefile against the unified buffered IO layer.
  • _thread._local real per-thread storage (PR #45). The previous implementation backed threading.local onto a single process-wide dict. Now each goroutine carries its own dict, and local.__init__ replays against each new thread the way CPython does. The args / kwargs captured at construction time are stored on the local object and re-run lazily on first access in each thread.
  • dataclasses.make_dataclass (PR #47). Ports the procedural dataclass builder. Clears the corresponding CI debt rows in the spec table.
  • functools PathFinder (PR #49). Drops the dead time / _colorize / pprint shims that were left over from the v0.12.0 import-chain workaround. functools resolves entirely through the PathFinder now.
  • _collections deque and defaultdict surface (PR #37). Rounds out the methods Modules/_collectionsmodule.c ships: deque.copy, deque.__reduce__, deque.insert, deque.maxlen, defaultdict.__missing__, defaultdict.copy, defaultdict.__reduce__.
  • signal.ItimerError (PR #36). Registered as a public exception class. setitimer / getitimer raise it rather than a bare OSError now.

Test infrastructure

  • Regrtest smoke test (PR #41). The TestRunSmokeTest gate runs a known-passing slice of Lib/test/test_*.py through the runner on every CI lane. A regression in the import chain trips this gate before the individual test ports notice.
  • Vendor Lib/test/support helpers (PR #40). Drops the CPython support helpers byte for byte: test.support, test.support.os_helper, test.support.threading_helper, test.support.warnings_helper. The shims that previously stood in for these go away.
  • Argparse resync to 3.14.5 (PR #39). The vendored argparse.py was at 3.14.0; this resyncs to 3.14.5. Picks up the upstream BooleanOptionalAction bugfix.
  • os.posixmodule.c surface (PR #38). Fills in remaining posixmodule.c entries: pathconf, pathconf_names, confstr, confstr_names, sysconf, sysconf_names, WCOREDUMP and the rest of the W* wait macros.
  • Unittest import unblock (PR #42). Three small but load-bearing fixes: bytearray += bytes (the sq_inplace_concat slot was missing for bytearray when the right operand was bytes); memoryview growing its method list to include tobytes, tolist, hex, cast, release, __enter__, __exit__; and the linecache closure capturing the wrong filename variable.

CI and lint

  • Golangci-lint 21-issue scrub (commit b172636). Every issue on the lint run as of v0.12.2 fixed. The lane is now configured to fail PRs that introduce new findings.
  • gofumpt and gofmt enforcement (commits d4e6442, 91e3640). module/io is now gofumpt-clean. The PR template assumes lint is green before review.
  • CPython 3.14.5 sync audit (spec 1707, commit 100447b). The upstream pin moves from 3.14.0 to 3.14.5. Net diff in our ports is tiny because the upstream patch releases are mostly stdlib bugfixes, but the audit catches the ones we care about (argparse.BooleanOptionalAction, traceback.TracebackException edge cases).

Compatibility

A few user-visible changes are worth flagging.

  • tell and seek on text streams take big ints. CPython has always returned a 21-byte cookie that overflows int64 once the stream is large or the codec mid-character. Code that captured f.tell() into a typed int64 variable and round-tripped it through f.seek(pos) now gets a real int back; the Go-side bridging passes through *big.Int. Most user code is fine because Python int is unbounded.
  • exc.__traceback__ is populated. Code that defensively checked if e.__traceback__ is None: will find it is never None now (unless explicitly cleared). traceback.format_exc() produces multi-frame output where it previously produced one-line stubs.
  • __annotations__ is lazy. Reading Cls.__annotations__ triggers a call to Cls.__annotate__(VALUE) on first access. Code that monkey-patched cls.__annotations__ to a dict before the class body finished now needs to write through the __annotate__ function or accept that the lazy getset will overwrite the patch on first read.
  • bytearray += bytes works. If you had a workaround that spelled this as bytearray.extend(b) because the inplace operator raised, that workaround can go.
  • tuple * int works against any int. The sq_repeat fix means (1, 2) * N honors N for any int rather than the limited subset the broken path accepted.

What's next

The big remaining work for v0.12.4 onwards is the test corpus.

  • The CPython Lib/test/test_*.py gate at scale. The smoke test that landed in PR #41 runs a curated slice. Expanding it to the full manifest requires the remaining shimmed modules (sqlite3, tkinter, curses, the multiprocessing / asyncio chains) to land, and then porting the test-runner-shaped helpers (test.support.bytecode_helper, test.support.script_helper) so the individual test files run.
  • Spec 1703 _sre polish. The regex engine is real but a few rare-opcode paths still take the slow interpreter loop instead of the specialized one CPython falls into; the gap is small (single digits of percent on micro-benchmarks) but worth closing.
  • Spec 1709 follow-up. A second pass on the TextIOWrapper state model will look at whether we can hold the snapshot in a smaller struct than the current per-chunk allocation.

Networking, multiprocessing, asyncio, sqlite3, ctypes, tk / curses, GUI tests, gdb / dtrace remain out of scope.

Acknowledgments

This release closes work tracked across these public-facing items:

  • Spec 1702 (io subsystem full port). All seven file ports closed.
  • Spec 1704 (object protocol full port). Phases 2 through 8 shipped.
  • Spec 1706 (PEP 649 / 749 deferred annotations). All eight phases shipped.
  • Spec 1707 (CPython 3.14.5 sync audit). Pin moved, diff reviewed.
  • Spec 1708 (Python/assemble.c location-emission). Full port.
  • Spec 1709 (textio.c internals). Four phases plus final gate.
  • Issue #544 (enum import). Closed by object protocol Phase 8.
  • Issue #543 (re import end to end). Closed by the v0.12.2 regex engine plus the v0.12.3 codec layer.
  • Issue #542 (fnmatch delegate). Closed.
  • Issue #510 (re / _sre full port). Closed.

The commit log covering everything since v0.12.2 is at compare v0.12.2..v0.12.3. The pull request bundle covering the spec 1702 finalization is PR #27; the object protocol phase set is PR #26; the textio internals final phase is PR #57.