Understanding the Go Compiler: The Parser

📚 Understanding the Go Compiler (2 of 4)
  1. 1. The Scanner
  2. 2. The Parser You are here
  3. 3. The Type Checker
  4. 4. The Unified IR Format
The Parser

In the previous blog post, we explored the scanner—the component that converts your source code from a stream of characters into a stream of tokens.

Now we’re ready for the next step: the parser.

Here’s the challenge the parser solves: right now, we have a flat list of tokens with no relationships between them. The scanner gave us package, main, {, fmt, ., Println… but it has no idea that Println belongs to the fmt package, or that the entire fmt.Println("Hello world") statement lives inside the main function.

The parser’s job is to take that flat token stream and give it structure. It does this by building what’s called an Abstract Syntax Tree (or AST for short).

What is an Abstract Syntax Tree?

An AST is a tree data structure where each node represents a meaningful construct in your code—things like function definitions, variable declarations, expressions, or statements. The tree structure itself captures the relationships between these constructs.

For example, code inside a function becomes a child node of that function in the tree. An identifier used in an expression becomes part of that expression’s subtree. The tree structure tells us “this belongs to that.”

This is probably easier to understand with a concrete example. Let’s go back to our hello world program:

package main

import "fmt"

func main() {
    fmt.Println("Hello world")
}

This source code can be represented as the following AST:

     0  *ast.File {
     1  .  Package: 1:1
     2  .  Name: *ast.Ident {
     3  .  .  NamePos: 1:9
     4  .  .  Name: "main"
     5  .  }
     6  .  Decls: []ast.Decl (len = 2) {
     7  .  .  0: *ast.GenDecl {
     8  .  .  .  TokPos: 3:1
     9  .  .  .  Tok: import
    10  .  .  .  Lparen: -
    11  .  .  .  Specs: []ast.Spec (len = 1) {
    12  .  .  .  .  0: *ast.ImportSpec {
    13  .  .  .  .  .  Path: *ast.BasicLit {
    14  .  .  .  .  .  .  ValuePos: 3:8
    15  .  .  .  .  .  .  Kind: STRING
    16  .  .  .  .  .  .  Value: "\"fmt\""
    17  .  .  .  .  .  }
    18  .  .  .  .  .  EndPos: -
    19  .  .  .  .  }
    20  .  .  .  }
    21  .  .  .  Rparen: -
    22  .  .  }
    23  .  .  1: *ast.FuncDecl {
    24  .  .  .  Name: *ast.Ident {
    25  .  .  .  .  NamePos: 5:6
    26  .  .  .  .  Name: "main"
    27  .  .  .  .  Obj: *ast.Object {
    28  .  .  .  .  .  Kind: func
    29  .  .  .  .  .  Name: "main"
    30  .  .  .  .  .  Decl: *(obj @ 23)
    31  .  .  .  .  }
    32  .  .  .  }
    33  .  .  .  Type: *ast.FuncType {
    34  .  .  .  .  Func: 5:1
    35  .  .  .  .  Params: *ast.FieldList {
    36  .  .  .  .  .  Opening: 5:10
    37  .  .  .  .  .  Closing: 5:11
    38  .  .  .  .  }
    39  .  .  .  }
    40  .  .  .  Body: *ast.BlockStmt {
    41  .  .  .  .  Lbrace: 5:13
    42  .  .  .  .  List: []ast.Stmt (len = 1) {
    43  .  .  .  .  .  0: *ast.ExprStmt {
    44  .  .  .  .  .  .  X: *ast.CallExpr {
    45  .  .  .  .  .  .  .  Fun: *ast.SelectorExpr {
    46  .  .  .  .  .  .  .  .  X: *ast.Ident {
    47  .  .  .  .  .  .  .  .  .  NamePos: 6:5
    48  .  .  .  .  .  .  .  .  .  Name: "fmt"
    49  .  .  .  .  .  .  .  .  }
    50  .  .  .  .  .  .  .  .  Sel: *ast.Ident {
    51  .  .  .  .  .  .  .  .  .  NamePos: 6:9
    52  .  .  .  .  .  .  .  .  .  Name: "Println"
    53  .  .  .  .  .  .  .  .  }
    54  .  .  .  .  .  .  .  }
    55  .  .  .  .  .  .  .  Lparen: 6:16
    56  .  .  .  .  .  .  .  Args: []ast.Expr (len = 1) {
    57  .  .  .  .  .  .  .  .  0: *ast.BasicLit {
    58  .  .  .  .  .  .  .  .  .  ValuePos: 6:17
    59  .  .  .  .  .  .  .  .  .  Kind: STRING
    60  .  .  .  .  .  .  .  .  .  Value: "\"Hello, World!\""
    61  .  .  .  .  .  .  .  .  }
    62  .  .  .  .  .  .  .  }
    63  .  .  .  .  .  .  .  Ellipsis: -
    64  .  .  .  .  .  .  .  Rparen: 6:32
    65  .  .  .  .  .  .  }
    66  .  .  .  .  .  }
    67  .  .  .  .  }
    68  .  .  .  .  Rbrace: 7:1
    69  .  .  .  }
    70  .  .  }
    71  .  }
    72  .  Scope: *ast.Scope {
    73  .  .  Objects: map[string]*ast.Object (len = 1) {
    74  .  .  .  "main": *(obj @ 27)
    75  .  .  }
    76  .  }
    77  .  Imports: []*ast.ImportSpec (len = 1) {
    78  .  .  0: *(obj @ 12)
    79  .  }
    80  .  Unresolved: []*ast.Ident (len = 1) {
    81  .  .  0: *(obj @ 46)
    82  .  }
    83  }

Let’s break down what we’re seeing here:

  • The root node is *ast.File—representing the entire source file
  • Package name (main) is stored directly in the file node
  • Declarations are in a list (Decls), containing both the import and the function declaration
  • The main function has its own subtree with a name, type information (parameters, return values), and a body
  • Inside the body you can find the fmt.Println call, which itself is broken down into its components: the selector expression (fmt.Println) and the arguments ("Hello, World!")

The key insight is this: the AST is just another representation of your code, structured in a way that makes it easy for the compiler to understand relationships and semantics.

Try It Yourself

Want to explore ASTs yourself? The Go standard library includes an ast package that lets you parse Go code and inspect the resulting tree. Here’s a complete program you can run:

import (
	"go/ast"
	"go/parser"
	"go/token"
)

func main() {
	// src is the input for which we want to print the AST.
	src := `package main

import "fmt"

func main() {
	fmt.Println("Hello, World!")
}
`

	// Create the AST by parsing src.
	fset := token.NewFileSet() // positions are relative to fset
	f, err := parser.ParseFile(fset, "", src, 0)
	if err != nil {
		panic(err)
	}

	// Print the AST.
	ast.Print(fset, f)

}

Now that we understand what an AST looks like, let’s dive into how the parser actually builds this tree structure from the stream of tokens.

How the Parser Works

The parser has some similarities to the scanner we explored in the previous post. Like the scanner, it processes input one piece at a time—except instead of reading one character at a time, the parser reads one token at a time.

But here’s a key difference: the scanner returns each token as it finds it, streaming them out one by one. The parser, on the other hand, consumes all the tokens and builds up the entire tree before returning anything. You don’t get individual nodes—you get the complete AST for the whole file when parsing is done.

Another interesting detail: the parser actually embeds the scanner directly inside itself. Here’s what that looks like in the source code:

type parser struct {
    file  *PosBase
    errh  ErrorHandler
    mode  Mode
    pragh PragmaHandler
    scanner

    base   *PosBase // current position base
    first  error    // first error encountered
    errcnt int      // number of errors encountered
    pragma Pragma   // pragmas

    fnest  int    // function nesting level (for error handling)
    xnest  int    // expression nesting level (for complit ambiguity resolution)
    indent []byte // tracing support
}

See that scanner field with no name? That means the parser embeds the scanner type directly, giving it access to all the scanner’s methods. This is how the parser can call next() to advance through tokens.

The parser always operates at the file level—you parse an entire file at once. You do this using either the Parse or ParseFile functions from the src/cmd/compile/internal/syntax package.

Let’s see how this parsing process kicks off.

Starting the Parse

The ParseFile function is simple: it reads the file contents and hands them off to the Parse function. Here’s where the real work begins:

func Parse(base *PosBase, src io.Reader, errh ErrorHandler, pragh PragmaHandler, mode Mode) (_ *File, first error) {
	defer func() {
		if p := recover(); p != nil {
			if err, ok := p.(Error); ok {
				first = err
				return
			}
			panic(p)
		}
	}()

	var p parser
	p.init(base, src, errh, pragh, mode)
	p.next()
	return p.fileOrNil(), p.first
}

Here’s what’s happening:

  1. The function creates a new parser and initializes it with the source code
  2. It calls next() to load the first token—this primes the parser, getting it ready to start recognizing patterns
  3. It calls fileOrNil() to parse the entire file and build the AST

Before we dive into what fileOrNil() does, let’s talk about the parsing approach the Go compiler uses. Understanding this will help you make sense of the code patterns you’re about to see.

Recursive Descent and Hierarchical Decomposition

The Go parser is classified as a recursive descent parser. This term describes a parsing technique where:

  1. Descent: The parser works top-down, starting from the highest-level construct (a file) and breaking it down into smaller pieces (declarations, statements, expressions)
  2. Recursive: Certain constructs can contain themselves—expressions can contain other expressions, statements can contain blocks of statements, types can contain other types

Now here’s an important distinction: most of what we’ll see in this article is actually hierarchical decomposition rather than direct recursion. Hierarchical decomposition means breaking a problem into smaller sub-problems and delegating to specialized functions—like how fileOrNil() calls constDecl(), which calls nameList(), which calls lower-level helpers. Each function handles one level of the hierarchy.

The true recursion happens in parts we won’t cover in detail—specifically when parsing expressions, statements, and types. For example, when parsing the expression a + (b * c), the expression parser has to call itself to parse the nested expression (b * c). That’s genuine recursion: the same function calling itself before it’s done.

So while the Go parser is a recursive descent parser overall, the examples we’ll walk through primarily demonstrate the hierarchical decomposition pattern. The naming comes from the fact that the parser uses both techniques, with recursion being essential for handling nested language constructs.

Let’s walk through the parsing process step by step, starting from the root of the AST.

Parsing a File

The parser always starts by calling the fileOrNil function. This function is responsible for building the root node of our AST—the *File node.

Let’s look at how it works, piece by piece.

Creating the File Node

func (p *parser) fileOrNil() *File {
	if trace {
		defer p.trace("file")()
	}

	f := new(File)
	f.pos = p.pos()
    ...

The parser creates a new, empty File node—this will become the root of our AST. As parsing progresses, it’ll populate this node with everything it finds: the package name, imports, function declarations, and so on.

It also records the file’s position in the source code, which is useful later for error messages.

With the empty file node ready, the parser can start filling it in—beginning with the package declaration.

Parsing the Package Declaration

    ...
	// PackageClause
	if !p.got(_Package) {
		p.syntaxError("package statement must be first")
		return nil
	}

	f.Pragma = p.takePragma()
	f.PkgName = p.name()
	p.want(_Semi)

	// don't bother continuing if package clause has errors
	if p.first != nil {
		return nil
	}
    ...

Now the parser checks if the first token is package. Remember, every Go file must start with a package declaration. If that token isn’t there, the parser immediately reports a syntax error and gives up.

If it finds package, it reads the package name (like main) and stores it in f.PkgName. Then it expects a semicolon—which, as we learned in the scanner article, was automatically inserted by the scanner after the package name.

If there are any errors at this point, parsing stops. There’s no point continuing if we don’t even have a valid package declaration.

Now that we have a valid package, the parser can move on to the next part of a Go file: imports.

Parsing Imports

    ...
	// { ImportDecl ";" }
	for p.got(_Import) {
		f.DeclList = p.appendGroup(f.DeclList, p.importDecl)
		p.want(_Semi)
	}
    ...

The parser loops, repeatedly checking if the next token is import. As long as it keeps finding import tokens, it processes them one at a time.

For each import it finds, it calls p.importDecl—a function that knows how to parse a single import statement—and adds the result to the file’s declaration list. Then it expects a semicolon before moving on.

There’s a clever detail here: p.importDecl is passed as a function reference, not called directly. The appendGroup helper is the one that actually calls it. This design handles both styles of imports equally well:

import "fmt"              // Single import

import (                  // Grouped imports
    "fmt"
    "strings"
)

The loop continues until there are no more import tokens, then the parser moves on to the rest of the file.

Parsing Top-Level Declarations

	// { TopLevelDecl ";" }
	for p.tok != _EOF {
		switch p.tok {
		case _Const:
			p.next()
			f.DeclList = p.appendGroup(f.DeclList, p.constDecl)

		case _Type:
			p.next()
			f.DeclList = p.appendGroup(f.DeclList, p.typeDecl)

		case _Var:
			p.next()
			f.DeclList = p.appendGroup(f.DeclList, p.varDecl)

		case _Func:
			p.next()
			if d := p.funcDeclOrNil(); d != nil {
				f.DeclList = append(f.DeclList, d)
			}

		default:
			if p.tok == _Lbrace && len(f.DeclList) > 0 && isEmptyFuncDecl(f.DeclList[len(f.DeclList)-1]) {
				// opening { of function declaration on next line
				p.syntaxError("unexpected semicolon or newline before {")
			} else {
				p.syntaxError("non-declaration statement outside function body")
			}
			p.advance(_Const, _Type, _Var, _Func)
			continue
		}

		// Reset p.pragma BEFORE advancing to the next token (consuming ';')
		// since comments before may set pragmas for the next function decl.
		p.clearPragma()

		if p.tok != _EOF && !p.got(_Semi) {
			p.syntaxError("after top level declaration")
			p.advance(_Const, _Type, _Var, _Func)
		}
	}

Here’s where the parser handles the rest of the file. At the top level of a Go file, only four things are allowed: constant declarations, type declarations, variable declarations, and function declarations.

The parser loops until it hits EOF, checking what kind of token it sees:

  • If it’s const, call constDecl to parse the constant
  • If it’s type, call typeDecl to parse the type definition
  • If it’s var, call varDecl to parse the variable
  • If it’s func, call funcDeclOrNil to parse the function

Each of these functions builds its own subtree (a ConstDecl node, a TypeDecl node, etc.) and returns it. The parser then adds that subtree to the DeclList in the file node.

If the parser encounters anything else—say, a statement like x := 5 sitting at the top level outside any function—it reports a syntax error. Go is strict about what belongs where.

Once the parser has worked through all the declarations and reaches the end of the file, there’s just one last step.

Finishing the File

	// p.tok == _EOF

	p.clearPragma()
	f.EOF = p.pos()

	return f
}

When the loop finishes (having hit EOF), the parser records the position of the end of the file and returns the complete File node.

And that’s it—we’ve built the root of our AST!

But remember, fileOrNil delegated work to other functions like constDecl, typeDecl, and funcDeclOrNil. Let’s zoom in on one of those to see how individual declarations get parsed. We’ll start with constant declarations since they’re relatively straightforward.

Parsing a Constant Declaration

Here’s the complete constDecl function:

func (p *parser) constDecl(group *Group) Decl {
	if trace {
		defer p.trace("constDecl")()
	}

	d := new(ConstDecl)
	d.pos = p.pos()
	d.Group = group
	d.Pragma = p.takePragma()

	d.NameList = p.nameList(p.name())
	if p.tok != _EOF && p.tok != _Semi && p.tok != _Rparen {
		d.Type = p.typeOrNil()
		if p.gotAssign() {
			d.Values = p.exprList()
		}
	}

	return d
}

Notice the pattern? Just like with the file node, we start by creating an empty node and then populate it.

First, it creates a new ConstDecl node and records its position.

Then it handles the group parameter. This deals with cases where you have multiple constants declared together inside parentheses:

const (
    A = 1
    B = 2
)

All constants in the parentheses share the same group. If you declare a constant standalone (const X = 5), the group is nil.

Next, it reads the constant name(s). Since you can declare multiple constants on one line (const A, B, C = 1, 2, 3), it collects all the names into a list.

After that, if there’s more to parse, it looks for two optional pieces:

  • A type annotation (like const X int = 5)
  • An assignment with values (like = 1, 2, 3)

Both are optional because Go can infer types from values. And in grouped constant blocks, later constants can even inherit values from earlier ones.

Finally, the function returns the completed ConstDecl node, which gets added to the file’s declaration list.

This pattern—create a node, parse its components, return the node—is the fundamental structure for all parser functions. The varDecl function works almost identically, and typeDecl and funcDeclOrNil follow the same general approach, just with more complexity.

More Complex Cases

I haven’t covered everything here. Function declarations involve parsing parameter lists, return types, and function bodies (which contain statements and expressions). Type declarations can define structs, interfaces, and other complex types. Expression parsing handles things like operator precedence, function calls, type assertions, and more.

These follow the same hierarchical decomposition pattern we’ve seen—but they also involve the true recursion we discussed earlier, where parsers call themselves to handle nested constructs.

If you’re curious about the details, I encourage you to explore src/cmd/compile/internal/syntax/parser.go yourself—it’s surprisingly readable once you understand the basic patterns.

Now let’s see all these pieces working together in action.

Walking Through an Example

Let’s trace through our hello world program and see exactly how the parser builds its AST.

The parser starts with the token stream from the scanner. Remember what that looked like? package, IDENT(main), ;, import, STRING("fmt"), ;, func, IDENT(main), (, ), {, and so on.

Here’s how the parsing unfolds:

File creation: The parser calls fileOrNil() and creates a new File node.

Package declaration: It sees the package token, reads the name main, expects a semicolon (which the scanner inserted), and stores the package name in the file node.

Import declaration: It sees import, calls importDecl, which reads the string "fmt" and creates an ImportSpec node. This gets added to the file’s declaration list.

Function declaration: It sees func, calls funcDeclOrNil, which:

  • Reads the function name (main)
  • Parses the parameter list (empty: ())
  • Parses the return type (none)
  • Parses the function body (the { ... } block)

Inside the function body, the parser encounters a statement: fmt.Println("Hello world"). It recognizes this as an expression statement containing a call expression. The call expression has:

  • A selector expression (fmt.Println) as the function being called
  • An argument list containing one string literal ("Hello world")

The parser builds nodes for all of these: CallExpr, SelectorExpr, Ident nodes for fmt and Println, and a BasicLit node for the string.

Finishing up: When the parser hits EOF, it records the end position and returns the complete File node, which now contains the entire AST tree representing our program.

And that’s how a flat stream of tokens becomes a structured tree that captures the meaning and relationships in our code.

Now that you understand how the parser works, you might be wondering: can I use this knowledge in my own projects? Absolutely.

Using the AST in Your Own Code

Understanding the AST isn’t just useful for learning how compilers work—it’s a practical tool you can use in your own Go projects. Many Go developers leverage the go/ast package to parse Go code programmatically and build powerful tools.

Here are some interesting projects that use AST parsing:

  • mage - A Make alternative that uses Go code as its build script language
  • goa - A framework that uses ASTs to generate API implementations from high-level designs
  • goql - Lets you query Go packages using SQL, as if they were databases

These are just a few examples of what’s possible. You can build your own AST-based tools too: translatable text extractors, automatic mock generators, custom linters that enforce your team’s coding conventions, refactoring tools, and more. If you can represent what you’re looking for as patterns in code structure, the AST makes it accessible.

Let’s wrap up what we’ve learned.

Summary

The parser is the second phase of the Go compiler. It takes the flat stream of tokens from the scanner and builds an Abstract Syntax Tree—a structured representation that captures the meaning and relationships in your code.

We’ve seen how the parser:

  • Uses hierarchical decomposition to break down parsing into small, focused functions, with each function handling one level of structure
  • Embeds the scanner to access tokens on demand
  • Builds the AST top-down, starting with the file node and working down through declarations, statements, and expressions
  • Follows a consistent pattern: create a node, parse its components, delegate to specialized functions for subtrees, return the completed node

The Go parser is strict and predictable. Every Go file must follow the same structure: package declaration, imports, then top-level declarations. This consistency makes Go code easy to parse—and easy for both humans and machines to understand.

If you want to go deeper, I highly recommend exploring src/cmd/compile/internal/syntax/parser.go. Once you understand the patterns we’ve covered, the rest of the parser becomes much more approachable.

In a future post, I’ll cover the Intermediate Representation (IR)—how the compiler transforms this AST into a lower-level representation that’s closer to actual machine operations. This is where the compiler starts preparing your code for optimization and code generation.