Skip to main content

Abstract syntax tree

The AST is the second compiler stage: the PEG parser builds a CST, which is immediately lowered to an AST whose nodes are defined by Parser/Python.asdl. The compiler operates on AST nodes; the public ast module exposes the same hierarchy to Python.

Source-of-record: Parser/Python.asdl, Python/Python-ast.c (generated), Lib/ast.py.

ASDL grammar

The AST is defined in ASDL (Zephyr Abstract Syntax Definition Language). The top level:

module Python
{
mod = Module(stmt* body, type_ignore* type_ignores)
| Interactive(stmt* body)
| Expression(expr body)
| FunctionType(expr* argtypes, expr returns)
}

The four mod productions correspond to the four compile modes (see Top-level components).

Statement nodes

NodeFields
FunctionDefname, args, body, decorator_list, returns, type_comment, type_params
AsyncFunctionDefSame as FunctionDef.
ClassDefname, bases, keywords, body, decorator_list, type_params
Returnvalue
Deletetargets
Assigntargets, value, type_comment
TypeAliasname, type_params, value
AugAssigntarget, op, value
AnnAssigntarget, annotation, value, simple
Fortarget, iter, body, orelse, type_comment
AsyncForSame as For.
Whiletest, body, orelse
Iftest, body, orelse
Withitems, body, type_comment
AsyncWithSame as With.
Matchsubject, cases
Raiseexc, cause
Trybody, handlers, orelse, finalbody
TryStarSame as Try. (PEP 654)
Asserttest, msg
Importnames
ImportFrommodule, names, level
Globalnames
Nonlocalnames
Exprvalue
Pass / Break / Continue(no fields)

Expression nodes

NodeFields
BoolOpop, values
NamedExprtarget, value (walrus)
BinOpleft, op, right
UnaryOpop, operand
Lambdaargs, body
IfExptest, body, orelse
Dictkeys, values
Setelts
ListCompelt, generators
SetCompelt, generators
DictCompkey, value, generators
GeneratorExpelt, generators
Awaitvalue
Yieldvalue
YieldFromvalue
Compareleft, ops, comparators
Callfunc, args, keywords
FormattedValuevalue, conversion, format_spec
JoinedStrvalues (used for f-strings)
Interpolationvalue, str, conversion, format_spec (PEP 750 t-string)
TemplateStrvalues (PEP 750 t-string outer)
Constantvalue, kind
Attributevalue, attr, ctx
Subscriptvalue, slice, ctx
Starredvalue, ctx
Nameid, ctx
Listelts, ctx
Tupleelts, ctx
Slicelower, upper, step

Context kinds (expr_context)

KindWhen
LoadRead context.
StoreAssignment target.
Deldel target.

Operator nodes

boolop

And, Or.

operator (binary)

Add, Sub, Mult, MatMult, Div, Mod, Pow, LShift, RShift, BitOr, BitXor, BitAnd, FloorDiv.

unaryop

Invert, Not, UAdd, USub.

cmpop

Eq, NotEq, Lt, LtE, Gt, GtE, Is, IsNot, In, NotIn.

Comprehension nodes

comprehension = (expr target, expr iter, expr* ifs, int is_async)

Exception handler

excepthandler = ExceptHandler(expr? type, identifier? name, stmt* body)

Same node for except and except*; the parent Try vs TryStar distinguishes.

Function definitions

arguments

arguments = (arg* posonlyargs, arg* args, arg? vararg,
arg* kwonlyargs, expr* kw_defaults, arg? kwarg,
expr* defaults)

arg

arg = (identifier arg, expr? annotation, string? type_comment)

keyword

keyword = (identifier? arg, expr value)

arg is None for **kwargs unpacking.

With items

withitem = (expr context_expr, expr? optional_vars)

Match nodes

match_case

match_case = (pattern pattern, expr? guard, stmt* body)

Patterns (pattern)

NodeFields
MatchValuevalue
MatchSingletonvalue
MatchSequencepatterns
MatchMappingkeys, patterns, rest
MatchClasscls, patterns, kwd_attrs, kwd_patterns
MatchStarname
MatchAspattern, name
MatchOrpatterns

Type parameter nodes (PEP 695)

NodeFields
TypeVarname, bound, default_value
ParamSpecname, default_value
TypeVarTuplename, default_value

Aliases

alias = (identifier name, identifier? asname)

Used by Import and ImportFrom.

Type ignore

type_ignore = TypeIgnore(int lineno, string tag)

Per-node attributes

Every node carries:

AttributeMeaning
linenoFirst source line (1-based).
col_offsetColumn on lineno.
end_linenoLast source line.
end_col_offsetColumn on end_lineno.

ast module API

FunctionRole
ast.parse(source, filename='<unknown>', mode='exec', type_comments=False, feature_version=None)Source -> AST.
ast.unparse(node)AST -> source.
ast.dump(node, annotate_fields=True, include_attributes=False, indent=None)Repr.
ast.literal_eval(node_or_string)Safe literal evaluation.
ast.walk(node)Iterate over descendants.
ast.iter_fields(node)Iterate (name, value) pairs.
ast.iter_child_nodes(node)Direct child nodes.
ast.NodeVisitorVisitor base class.
ast.NodeTransformerIn-place transformer.
ast.fix_missing_locations(node)Copy lineno/col from parent.
ast.copy_location(new_node, old_node)Copy location attributes.
ast.increment_lineno(node, n=1)Shift line numbers.
ast.get_docstring(node, clean=True)Extract docstring from a module/function/class.

AST optimisations

Before reaching codegen, the AST passes through optimisation steps:

PassEffect
Constant foldingFold BinOp, UnaryOp, BoolOp over constants.
String concatenationAdjacent string literals merge.
Negative-literal materialisationUnaryOp(USub, Constant(n)) -> Constant(-n) for ints.
Optimised if / whileConditional constants short-circuit.
Doctring detectionFirst statement of a function/class/module that is a Constant str is moved to co_consts[0].

Gopy status

AreaState
Every AST nodeGenerated from Python.asdl.
ast module surfaceComplete.
ast.dump/ast.unparse parityComplete; byte-equivalent to CPython.
AST optimisationsComplete.

Reference

  • Parser/Python.asdl. Canonical grammar.
  • Python/Python-ast.c. Generated node implementations.
  • Lib/ast.py. Python-facing API.
  • ast/. gopy's port.
  • Grammar for the surface syntax this AST encodes.