You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

274 lines
13 KiB

This chapter introduces the core concepts in Go that are needed to follow this paper.
Go is a language similar to C, although with a few minor, but important differences.
First, Go is garbage collected, meaning that the programmer does not need to allocate
and free memory\footnote{although allocating memory is possible with the \mintinline{go}|new|
built-in function}. Secondly, Go does not allow for pointer arithmetic.
`Without pointer arithmetic it's possible to create a language that can never derive an
illegal address that succeeds incorrectly'\autocite{go-pointerarithmetic}.
Further, Go provides a built-in data type that does not exist in plain C: slices.
\section{Go Slices}\label{sec:go-slices}
As mentioned in Section~\ref{sec:why-go}, Go does not have a `list' implementation and lists are rarely used.
The reason for this
is twofold. Firstly, as Go does not have polymorphism, it is not possible for users to implement a generic
`list' type that would work with any underlying type. Secondly, the Go authors added `slices' as a core type
to the language. From a usage perspective, lists would not add anything compared to slices.
Go's Slices can be viewed as an abstraction over arrays, to mitigate some of the weaknesses of arrays
when compared to lists.
\begin{quote}
Arrays have their place, but they're a bit inflexible, so you don't see them too often in Go code.
Slices, though, are everywhere. They build on arrays to provide great power and convenience.\autocite{golang-slices}
\end{quote}
Slices can be visualised as a `struct' over an array:
\begin{gocode}
// NOTE: this type does not really exist, it
// is just to visualise how they are implemented.
type Slice struct {
// the underlying "backing store" array
array *[]T
// the length of the slice / view on the array
len int
// the capacity of the array from the
// starting index of the slice
cap int
}
\end{gocode}
With the `append' function, elements can be added to a slice. Should the underlying array not have enough
capacity left to store the new elements, a new array will be created and the data from the old array will
be copied into the new one. This happens transparently to the user.
\subsection{Using Slices}
`head', `tail' and `last' operations can be done with index expressions:
\begin{gocode}
// []<T> initialises a slice, while [n]<T> initialises an
// array, which is of fixed length n. We're only working
// with slices here.
s := []string{"first", "second", "third"}
head := s[0]
tail := s[1:]
last := s[len(s)-1]
\end{gocode}
Adding elements or joining slices is achieved with `append':
\begin{gocode}
s := []string{"first", "second"}
s = append(s, "third", "fourth")
t := []string{"fifth", "seventh"}
s = append(s, t...)
// to prepend an element, one has to create a
// slice out of that element
s = append([]string{"zeroth"}, s...)
\end{gocode}
Append is a variadic function, meaning it takes \textit{n} elements. If the slice is of type \mintinline{go}|[]T|,
the appended elements have to be of type \mintinline{go}|T|.
To join two lists, the second list is expanded into
variadic arguments.
More complex operations like removing elements, inserting elements in the middle or finding
elements in a slice require helper functions, which have also been documented in Go's
Slice Tricks\autocite{slice-tricks}.
\subsection{What is missing from Slices}
This quick glance at slices should clarify that, though the runtime characteristics of lists and slices
can differ, from a usage standpoint, what is possible with lists is also possible with slices.
In a typical program written in a functional language, lists take a central role\footnote{Interestingly,
there is no clear and direct answer as to why they do. One reason may be because they are recursively
defined and trivially to implement functionally. Further, they are easier to use than arrays, where
the programmer would need to track the index and bound of the array (imagine keeping track of the
indices in a recursive function)\autocite{why-lists}}. Because of this, functional languages have
a number of helper functions like `map', `filter' and `fold'\autocite{haskell-list-funcs} to modify and
work on lists. These so called `higher order functions' currently do
not exist in Go and would need to be implemented by the programmer. With no support for polymorphism, a
different implementation would need to be written for every slice-type that is used. The type \mintinline{go}|[]int|
(read: a slice of integers) differs from \mintinline{go}|[]string|, which means that a possible
`map' function would have to be written once to support slices of integers, once to support slices
of strings, and a combination of these two:
\begin{gocode}
func mapIntToInt(f func(int) int, []int) []int
func mapIntToString(f func(int) string, []int) []string
func mapStringToInt(f func(string) int, []string) []int
func mapStringToString(f func(string) string, []string) []string
\end{gocode}
With 7 base types (eliding the different `int' types like `int8', `uint16`, `int16', etc.), this would
mean $7^{2} = 49$ map functions just to cover the base types. Counting the different numeric
types into that equation (totally 19 distinct types\autocite{go-basetypes}), would grow that number to $19^{2} = 361$ functions.
Though this code could be generated, it misses user-defined types which would still
need to be generated separately in a pre-compile step.
Another option, instead of having a function per type, would be that `map' takes and returns empty interfaces
(\mintinline{go}|interface{}|). However, `the empty interface says
nothing'\autocite{empty-interface}. The declaration of `map' would be:
\begin{gocode}
func map(f func(interface{}) interface{}, interface{}) interface{}
\end{gocode}
This function header does not say anything about it's types, which would
mean that they would need to be checked at runtime and handled gracefully. It
would also require the caller to do a type assertion after every call. Further,
a slice of type \mintinline{go}|T| cannot simply be converted or asserted to a slice of another type
\mintinline{go}|T2|\autocite{go-interface-slice-conv}\autocite{go-interface-slice-conv2}.
Because of this limitation, a typical usage pattern of map would be:
\begin{gocode}
s := []string{"hello", "world"}
var i []interface{}
for _, e := range s {
i = append(i, e)
}
// i is now populated and can be used.
r := map(someFunc, f)
// to convert it back to []string:
s = r.([]string)
\end{gocode}
This exemplifies why using the empty interface is not an option. Further, the
function could not really be type-checked at compile time, as there is no indication
of which argument's type must be equal.
\section{Built-in functions}
To mitigate these issues, the most common list-operations (in Go slice-operations) will
be added as built-ins to the compiler, so that the programmer can use these functions
on every slice-type without any conversion or code generation being necessary.
The language specification defines what built-in functions are and which built-in
functions should exist:
\begin{quote}
Built-in functions are predeclared. They are called like any other function
but some of them accept a type instead of an expression as the first argument.
The built-in functions do not have standard Go types, so they can only appear
in call expressions; they cannot be used as function values.\autocite{go-spec-builtins}
\end{quote}
For example, the documentation for the built-in \mintinline{go}|append|:
\begin{gocode}
// The append built-in function appends elements to the end of a slice.
// ...
func append(slice []Type, elems ...Type) []Type
\end{gocode}
The documentation shows that the types supplied to append are not specified upfront.
Instead, they are resolved and checked during compilation of the program.
Thus, in order to have generic list processing functions, these functions need to
be implemented as built-ins in the compiler.
\section{The Go Compiler}
The Go programming language is defined by its specification\autocite{go-spec}, and not
it's implementation. As of Go 1.14, there are two major implementations of that
specification; Google's self-hosting compiler toolchain `gc', which is written in
Go, and `gccgo', a front end for GCC, written in C++.
When talking about the Go compiler, what's mostly referred to is `gc'\footnote{`gc' stands
for `go compiler', and not `garbage collection' (which is abbreviated as `GC').}.
A famous, although not completely correct story tells about Go being designed
while a C++ program was compiling\autocite{less-is-more}.
This is why one of the main goals when designing Go was fast compilation times:
\begin{quote}
Finally, working with Go is intended to be fast: it should take at most a few
seconds to build a large executable on a single computer. To meet these goals
required addressing a number of linguistic issues: an expressive but lightweight
type system; concurrency and garbage collection; rigid dependency specification;
and so on. These cannot be addressed well by libraries or tools; a new language
was called for.\autocite{go-faq}
\end{quote}
Go has taken some measures to combat slow compilation times. In general, Go's dependency resolution is simpler
compared to other languages, for example by not allowing circular dependencies.
Furthermore, compilation is not even attempted if there are unused
imports or unused declarations of variables, types and functions.
This leads to less code to compile and in turn shorter compilation times.
Another reason is that `the `gc' compiler is simpler code compared to most
recent compilers'\autocite{nuts-compiler}. However, according
to Rob Pike, one of the creators of Go, Go's compiler is not notably fast, but
most other compilers are slow:
\begin{quote}
The compiler hasn't even been properly tuned for speed. A truly fast compiler
would generate the same quality code much faster.\autocite{nuts-compiler}
\end{quote}
The code generation with the `gc' compiler is split into four phases:
\subsection{Parsing Go programs}
\newglossaryentry{dag}{name=DAG, description={Directed Acyclic Graph}}
\newglossaryentry{ast}{name=AST,description={Abstract Syntax Tree, an abstract representation of source code as a tree}}
The first phase of compilation is parsing Go programs into a syntax tree. This
is done by tokenising the code (`lexical analysis' - the `lexer'), parsing
(`syntax analysis' - the `parser') it and then constructing a syntax tree
(\gls{ast})\footnote{Technically, the syntax tree is a syntax \gls{dag}\autocite{ast-node-dag}}
for each source file.
Go has been designed to be easy to parse and analyse. For example, in contrast
to many other programming languages,
Go does not require a symbol table to parse source code.\autocite{go-faq-symbol}.
\subsection{Type-checking and AST-transformation}\label{sec:comp-type}
The second phase of compilation starts by converting the `syntax' package's
AST, created in the first phase, to the compiler's AST representation. This
is due to historical reasons, as gc's AST definition was carried over
from the C implementation.
After the conversion, the AST is type-checked. Within the type-checking, there
are also some additional steps included like checking for `declared and not used'
variables and determining whether a function terminates.
After type-checking, transformations are applied on the AST. This includes
eliminating dead code, inlining function calls and escape analysis, to name a few.
What is also done in the transformation phase is rewriting built-in function
calls, replacing for example a call to the built-in `append' with the necessary
AST structure and runtime-calls to implement its functionality.
\subsection{SSA}
\newglossaryentry{ssa}{name=SSA,description={Single Static Assignment, an intermediate representation between the AST and the compiled binary that simplifies and improves compiler optimisations}}
In the third phase, the AST is converted to \gls{ssa} form. SSA (Single Static Assignment) is `a
lower-level intermediate representation with specific properties that make it
easier to implement optimizations and to eventually generate machine code from
it'\autocite{compiler-readme}.
The conversion consists of multiple `passes' through the SSA that
apply machine-independent rules to optimise code. These generic
rewrite rules are applied on every architecture and thus mostly
concern expressions (for example replacing expressions with constant values and
optimising multiplications), dead code elimination and removal of unneeded
nil-checks.
\subsection{Generating machine code}
Lastly, in the fourth phase of the compilation, machine-specific
SSA optimisations are applied. These may include:
\begin{itemize}
\item Rewriting generic values into their machine-specific variants
(for example, on amd64, combining load-store operations)
\item Dead-code elimination
\item Pointer liveness analysis
\item Removing unused local variables
\end{itemize}
After generating and optimising the SSA form, the code is passed to the
assembler, which replaces the so far generic instructions with
architecture-specific machine code and writes out the final object file\autocite{compiler-readme}.