Notes on Go binary metadata

By default, Go binaries contain a wealth of information that can be immensely useful for reverse engineering. The encoding of the information is dependent on the Go compiler used, the compiler version, and the platform you’re targeting. For these notes, I’m using the official go1.17 compiler, targeting GOARCH=386 and GOOS=linux.

Function information

Go binaries contain symbol information, and function metadata (including PC-to-line mappings). The Go runtime uses this information for (but probably not limited to) stack-unwinding and printing stack traces.

At runtime, it’s possible to access this information by calling runtime.FuncForPC.

An easy way to access this information when reverse-engineering is by inspecting the binary with go tool objdump.

TEXT main.NewStruct(SB) /Users/danieltang/Desktop/godemo/test.go
  test.go:15		0x80c12a0		658b0d00000000		MOVL GS:0, CX
  test.go:15		0x80c12a7		8b89fcffffff		MOVL 0xfffffffc(CX), CX
  test.go:15		0x80c12ad		3b6108			CMPL 0x8(CX), SP
  test.go:15		0x80c12b0		761d			JBE 0x80c12cf
  test.go:15		0x80c12b2		83ec08			SUBL $0x8, SP
  test.go:16		0x80c12b5		8d05c0d20c08		LEAL 0x80cd2c0, AX
  test.go:16		0x80c12bb		890424			MOVL AX, 0(SP)
  test.go:16		0x80c12be		e81d19f9ff		CALL runtime.newobject(SB)
  test.go:16		0x80c12c3		8b442404		MOVL 0x4(SP), AX
  test.go:16		0x80c12c7		8944240c		MOVL AX, 0xc(SP)
  test.go:16		0x80c12cb		83c408			ADDL $0x8, SP
  test.go:16		0x80c12ce		c3			RET
  test.go:15		0x80c12cf		e8acd1fdff		CALL runtime.morestack_noctxt(SB)
  test.go:15		0x80c12d4		ebca			JMP main.NewStruct(SB)

The tool will print functions, with their disassembly, names, source file name, and line numbers.

At a lower level, symbol definitions are stored in the .gosymtab section, and PC-to-line mappings are stored in the .gopclntab section.

It’s possible to programmatically parse this information using the debug/gosyms package. debug/elf also provides useful utilities for reading ELF files.

Type information

The Go compiler generates metadata about most (all?) types used in the program. It needs to generate this information in order for the runtime, reflection, and just about anything that uses interfaces to work.

Runtime

One of the aspects of the runtime that needs access to type information is the memory allocator. The type information is used to determine (at minimum) how much memory to allocate. I suspect the type information is also used for garbage collection (for discovering pointers), but I have not confirmed this.

To illustrate this, new(Struct) roughly compiles to this snippet:

  test.go:16		0x80c12b5		8d05c0d20c08		LEAL 0x80cd2c0, AX
  test.go:16		0x80c12bb		890424			MOVL AX, 0(SP)
  test.go:16		0x80c12be		e81d19f9ff		CALL runtime.newobject(SB)

Conceptually, this looks something like runtime.newobject((*_type)(0x80cd2c0)).

The Go compiler has built up a structure containing type information, and stored it in the .rodata section of the binary. 0x80cd2c0 is the hard-coded address of where the type information will be in memory, when the program is run.

Interfaces

Conceptually, an interface is implemented as a structure that describes:

The type of the value
The value itself

While Go does not expose these details, it’s possible to see the underlying data structure by reading the code of the reflect package.

If a type will ever be assigned to an interface, the Go compiler will need to generate type information for it. Like above, it’ll store the interface definition in .rodata, and hard-code the address when doing an interface assignment.

To illustrate again, (interface{})((*Struct)(nil)) roughly compiles to:

  test.go:20		0x80c12f2		8d0500600c08		LEAL 0x80c6000, AX
  test.go:20		0x80c12f8		89442404		MOVL AX, 0x4(SP)
  test.go:20		0x80c12fc		c744240800000000	MOVL $0x0, 0x8(SP)

Conceptually, this looks like emptyInterface{typ: (*_type)(0x80c6000), word: nil} under the hood. As before, 0x80c6000 is the address containing the structure that describes the type.

Reflection

The reflect package functions by extracting the type information out from the interface provided when calling reflect.TypeOf or reflect.ValueOf.

It’s also worth noting that the type information doesn’t need to be hardcoded by the compiler. At the end of the day, it’s all data. This is how the reflect package is able to dynamically create new types at runtime.

Type descriptors

There are different type descriptor structures for the different primitive types that Go offers. All of them share a common header, _type.

For my version of Go, here’s what the header looks like:

type _type struct {
	size       uintptr
	ptrdata    uintptr // size of memory prefix holding all pointers
	hash       uint32
	tflag      tflag
	align      uint8
	fieldAlign uint8
	kind       uint8
	// function for comparing objects of this type
	// (ptr to object A, ptr to object B) -> ==?
	equal func(unsafe.Pointer, unsafe.Pointer) bool
	// gcdata stores the GC type data for the garbage collector.
	// If the KindGCProg bit is set in kind, gcdata is a GC program.
	// Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
	gcdata    *byte
	str       nameOff
	ptrToThis typeOff
}

The kind field is used to determine what the containing structure is, and the pointer reinterpreted to access the type-descriptor-specific fields. For example, here’s what the a struct type descriptor structure looks like for my version of Go:

type structtype struct {
	typ     _type
	pkgPath name
	fields  []structfield
}

It’s worth noting that name strings are not encoded in the usual manner. See reflect/types.go for information on how they’re encoded.

nameOff are offsets, rather than pointers. I believe they’re an offset from the address of the .rodata section in memory.

Statically extracting type information

Since these type descriptors aren’t stored at any known places in memory, it can be difficult to extract.

A good way is to scan through the disassembly of the program, and look for calls to runtime.newobject. The first argument will be the address of the type descriptor, and is commonly hard-coded.

This method should be able to find a reasonably comprehensive list of type descriptors.