Golang Internals, Part 4: Object Files and Function Metadata

by Siarhei MatsiukevichMarch 18, 2015

This blog post explores the structure of function metadata, and how a garbage collector uses it.

Golang-Part-4-Object-Files-and-Function-Metadata

Today, we’ll take a closer look at the Func structure and discuss a few details on how garbage collection works in Go.

This post is a continuation of “Golang Internals, Part 3: The Linker and Go Object Files” and uses the same sample program. So, we strongly advise that you read the previous part before moving forward.

Table of Contents

The structure of function metadata

The main idea behind relocations should be clear from Part 3. Now let’s take a look at the Func structure of the main method.

Func: &goobj.Func{
    Args:    0,
    Frame:   8,
    Leaf:    false,
    NoSplit: false,
    Var:     {
    },
    PCSP:   goobj.Data{Offset:255, Size:7},
    PCFile: goobj.Data{Offset:263, Size:3},
    PCLine: goobj.Data{Offset:267, Size:7},
    PCData: {
        {Offset:276, Size:5},
    },
    FuncData: {
        {
            Sym:    goobj.SymID{Name:"gclocals·3280bececceccd33cb74587feedb1f9f", Version:0},
         Offset: 0,
     },
     {
         Sym:    goobj.SymID{Name:"gclocals·3280bececceccd33cb74587feedb1f9f", Version:0},
               Offset: 0,
           },
       },
       File: {"/home/adminone/temp/test.go"},
   },

You can think of this structure as function metadata emitted by the compiler in the object file and used by the Go runtime. This article explains the exact format and meaning of the different fields in Func. Now, we will try to show you how this metadata is used in the runtime.

Inside the runtime package, this metadata is mapped on the following struct.

type _func struct {
	entry   uintptr // start pc
	nameoff int32   // function name

	args  int32 // in/out args size
	frame int32 // legacy frame size; use pcsp if possible

	pcsp      int32
	pcfile    int32
	pcln      int32
	npcdata   int32
	nfuncdata int32
}

You can see that not all the information that was in the object file has been mapped directly. Some of the fields are only used by the linker. Still, the most interesting here are the pcsp, pcfile, and pcln fields, which are used when a program counter is translated into a stack pointer, file name, and line accordingly.

This is required, for example, when panic occurs. At that exact moment, the runtime only knows about the program counter of the current assembly instruction that has triggered panic. So, the runtime uses that counter to obtain the current file, line number, and full stack trace. The file and line number are resolved directly, using the pcfile and pcln fields. The stack trace is resolved recursively, using pcsp.

Now that we have a program counter, the question is, how do we get a corresponding line number? To answer it, you need to look through assembly code and understand how line numbers are stored in the object file.

0x001a 00026 (test.go:4)	MOVQ	$1,(SP)
	0x0022 00034 (test.go:4)	PCDATA	$0,$0
	0x0022 00034 (test.go:4)	CALL	,runtime.printint(SB)
	0x0027 00039 (test.go:5)	ADDQ	$8,SP
	0x002b 00043 (test.go:5)	RET	,

We can see that program counters from 26 to 38 inclusive correspond to line number 4 and counters from 39 to next_function_program_counter - 1 correspond to line number 5. For space efficiency, it is enough to store the following map.

26 - 4
39 - 5
…

This is almost exactly what the compiler does. The pcln field points to a particular offset in a map that corresponds to the first program counter of the current function. Knowing this offset and also the offset of the first program counter of the next function, the runtime can use binary search to find the line number that corresponds to the given program counter.

In Go, this idea is generalized. Not only a line number or stack pointer can be mapped to a program counter, but also any integer value. This is done via the PCDATA instruction. Each time, the linker finds the following instruction.

0x0022 00034 (test.go:4)	PCDATA	$0,$0

It doesn’t generate any actual assembler instructions. Instead, it stores the second argument of this instruction in a map with the current program counter, while the first argument indicates what map is used. With this first argument, we can easily add new maps, which meaning is known to the compiler and runtime but is opaque to the linker.

How a garbage collector uses function metadata

The last thing that still needs to be clarified in function metadata is the FuncData array. It contains information necessary for garbage collection. Go uses the mark-and-sweep garbage collector (GC) that operates in two stages. During the first stage (mark), it traverses through all objects that are still in use and marks them as reachable. All the unmarked objects are removed during the second (sweep) stage.

So, the garbage collector starts by looking for a reachable object in several known locations, such as global variables, processor registers, stack frames, and pointers in objects that have already been reached. However, if you think about it carefully, looking for pointers in stack frames is far from a trivial task. So, when the runtime is performing garbage collection, how does it distinguish whether a variable in the stack is a pointer or belongs to a non-pointer type? This is where FuncData comes into play.

For each function, the compiler creates two variables. One contains a bitmap vector for the arguments area of the stack frame. The other one contains a bitmap for the rest of the frame that includes all the local variables of pointer types defined in the function. Each of these variables tells the garbage collector, where exactly in the stack frame the pointers are located, and that information is enough for it to do its job.

It is also worth mentioning that like PCDATA, FUNCDATA is also generated by a pseudo-Go assembly instruction.

0x001a 00026 (test.go:3)	FUNCDATA	$0,gclocals·3280bececceccd33cb74587feedb1f9f+0(SB)

The first argument of this instruction indicates, whether this is function data for arguments or a local variables area. The second one is actually a reference to a hidden variable that contains a GC mask.

In the upcoming posts, we will investigate the Go bootstrap process, which is the key to understanding how the Go runtime works.