Notes on Go binary metadata
By default, Go binaries contain a wealth of information that can be immensely useful for reverse engineering. The encoding of the information is dependent on the Go compiler used, the compiler version, and the platform you’re targeting. For these notes, I’m using the official go1.17 compiler, targeting GOARCH=386
and GOOS=linux
.
Function information
Go binaries contain symbol information, and function metadata (including PC-to-line mappings). The Go runtime uses this information for (but probably not limited to) stack-unwinding and printing stack traces.
At runtime, it’s possible to access this information by calling runtime.FuncForPC
.
An easy way to access this information when reverse-engineering is by inspecting the binary with go tool objdump
.
TEXT main.NewStruct(SB) /Users/danieltang/Desktop/godemo/test.go
test.go:15 0x80c12a0 658b0d00000000 MOVL GS:0, CX
test.go:15 0x80c12a7 8b89fcffffff MOVL 0xfffffffc(CX), CX
test.go:15 0x80c12ad 3b6108 CMPL 0x8(CX), SP
test.go:15 0x80c12b0 761d JBE 0x80c12cf
test.go:15 0x80c12b2 83ec08 SUBL $0x8, SP
test.go:16 0x80c12b5 8d05c0d20c08 LEAL 0x80cd2c0, AX
test.go:16 0x80c12bb 890424 MOVL AX, 0(SP)
test.go:16 0x80c12be e81d19f9ff CALL runtime.newobject(SB)
test.go:16 0x80c12c3 8b442404 MOVL 0x4(SP), AX
test.go:16 0x80c12c7 8944240c MOVL AX, 0xc(SP)
test.go:16 0x80c12cb 83c408 ADDL $0x8, SP
test.go:16 0x80c12ce c3 RET
test.go:15 0x80c12cf e8acd1fdff CALL runtime.morestack_noctxt(SB)
test.go:15 0x80c12d4 ebca JMP main.NewStruct(SB)
The tool will print functions, with their disassembly, names, source file name, and line numbers.
At a lower level, symbol definitions are stored in the .gosymtab
section, and PC-to-line mappings are stored in the .gopclntab
section.
It’s possible to programmatically parse this information using the debug/gosyms package. debug/elf also provides useful utilities for reading ELF files.
Type information
The Go compiler generates metadata about most (all?) types used in the program. It needs to generate this information in order for the runtime, reflection, and just about anything that uses interfaces to work.
Runtime
One of the aspects of the runtime that needs access to type information is the memory allocator. The type information is used to determine (at minimum) how much memory to allocate. I suspect the type information is also used for garbage collection (for discovering pointers), but I have not confirmed this.
To illustrate this, new(Struct)
roughly compiles to this snippet:
test.go:16 0x80c12b5 8d05c0d20c08 LEAL 0x80cd2c0, AX
test.go:16 0x80c12bb 890424 MOVL AX, 0(SP)
test.go:16 0x80c12be e81d19f9ff CALL runtime.newobject(SB)
Conceptually, this looks something like runtime.newobject((*_type)(0x80cd2c0))
.
The Go compiler has built up a structure containing type information, and stored it in the .rodata
section of the binary. 0x80cd2c0
is the hard-coded address of where the type information will be in memory, when the program is run.
Interfaces
Conceptually, an interface is implemented as a structure that describes:
- The type of the value
- The value itself
While Go does not expose these details, it’s possible to see the underlying data structure by reading the code of the reflect package.
If a type will ever be assigned to an interface, the Go compiler will need to generate type information for it. Like above, it’ll store the interface definition in .rodata
, and hard-code the address when doing an interface assignment.
To illustrate again, (interface{})((*Struct)(nil))
roughly compiles to:
test.go:20 0x80c12f2 8d0500600c08 LEAL 0x80c6000, AX
test.go:20 0x80c12f8 89442404 MOVL AX, 0x4(SP)
test.go:20 0x80c12fc c744240800000000 MOVL $0x0, 0x8(SP)
Conceptually, this looks like emptyInterface{typ: (*_type)(0x80c6000), word: nil}
under the hood. As before, 0x80c6000
is the address containing the structure that describes the type.
Reflection
The reflect package functions by extracting the type information out from the interface provided when calling reflect.TypeOf
or reflect.ValueOf
.
It’s also worth noting that the type information doesn’t need to be hardcoded by the compiler. At the end of the day, it’s all data. This is how the reflect package is able to dynamically create new types at runtime.
Type descriptors
There are different type descriptor structures for the different primitive types that Go offers. All of them share a common header, _type
.
For my version of Go, here’s what the header looks like:
type _type struct {
size uintptr
ptrdata uintptr // size of memory prefix holding all pointers
hash uint32
tflag tflag
align uint8
fieldAlign uint8
kind uint8
// function for comparing objects of this type
// (ptr to object A, ptr to object B) -> ==?
equal func(unsafe.Pointer, unsafe.Pointer) bool
// gcdata stores the GC type data for the garbage collector.
// If the KindGCProg bit is set in kind, gcdata is a GC program.
// Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
gcdata *byte
str nameOff
ptrToThis typeOff
}
The kind
field is used to determine what the containing structure is, and the pointer reinterpreted to access the type-descriptor-specific fields. For example, here’s what the a struct type descriptor structure looks like for my version of Go:
type structtype struct {
typ _type
pkgPath name
fields []structfield
}
It’s worth noting that name strings are not encoded in the usual manner. See reflect/types.go for information on how they’re encoded.
nameOff
are offsets, rather than pointers. I believe they’re an offset from the address of the .rodata
section in memory.
Statically extracting type information
Since these type descriptors aren’t stored at any known places in memory, it can be difficult to extract.
A good way is to scan through the disassembly of the program, and look for calls to runtime.newobject
. The first argument will be the address of the type descriptor, and is commonly hard-coded.
This method should be able to find a reasonably comprehensive list of type descriptors.