You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
274 lines
13 KiB
274 lines
13 KiB
This chapter introduces the core concepts in Go that are needed to follow this paper.
|
|
Go is a language similar to C, although with a few minor, but important differences.
|
|
First, Go is garbage collected, meaning that the programmer does not need to allocate
|
|
and free memory\footnote{although allocating memory is possible with the \mintinline{go}|new|
|
|
built-in function}. Secondly, Go does not allow for pointer arithmetic.
|
|
`Without pointer arithmetic it's possible to create a language that can never derive an
|
|
illegal address that succeeds incorrectly'\autocite{go-pointerarithmetic}.
|
|
|
|
Further, Go provides a built-in data type that does not exist in plain C: slices.
|
|
|
|
\section{Go Slices}\label{sec:go-slices}
|
|
|
|
As mentioned in Section~\ref{sec:why-go}, Go does not have a `list' implementation and lists are rarely used.
|
|
The reason for this
|
|
is twofold. Firstly, as Go does not have polymorphism, it is not possible for users to implement a generic
|
|
`list' type that would work with any underlying type. Secondly, the Go authors added `slices' as a core type
|
|
to the language. From a usage perspective, lists would not add anything compared to slices.
|
|
|
|
Go's Slices can be viewed as an abstraction over arrays, to mitigate some of the weaknesses of arrays
|
|
when compared to lists.
|
|
|
|
\begin{quote}
|
|
Arrays have their place, but they're a bit inflexible, so you don't see them too often in Go code.
|
|
Slices, though, are everywhere. They build on arrays to provide great power and convenience.\autocite{golang-slices}
|
|
\end{quote}
|
|
|
|
Slices can be visualised as a `struct' over an array:
|
|
|
|
\begin{gocode}
|
|
// NOTE: this type does not really exist, it
|
|
// is just to visualise how they are implemented.
|
|
type Slice struct {
|
|
// the underlying "backing store" array
|
|
array *[]T
|
|
// the length of the slice / view on the array
|
|
len int
|
|
// the capacity of the array from the
|
|
// starting index of the slice
|
|
cap int
|
|
}
|
|
\end{gocode}
|
|
|
|
With the `append' function, elements can be added to a slice. Should the underlying array not have enough
|
|
capacity left to store the new elements, a new array will be created and the data from the old array will
|
|
be copied into the new one. This happens transparently to the user.
|
|
|
|
\subsection{Using Slices}
|
|
|
|
`head', `tail' and `last' operations can be done with index expressions:
|
|
|
|
\begin{gocode}
|
|
// []<T> initialises a slice, while [n]<T> initialises an
|
|
// array, which is of fixed length n. We're only working
|
|
// with slices here.
|
|
s := []string{"first", "second", "third"}
|
|
head := s[0]
|
|
tail := s[1:]
|
|
last := s[len(s)-1]
|
|
\end{gocode}
|
|
|
|
Adding elements or joining slices is achieved with `append':
|
|
|
|
\begin{gocode}
|
|
s := []string{"first", "second"}
|
|
s = append(s, "third", "fourth")
|
|
t := []string{"fifth", "seventh"}
|
|
s = append(s, t...)
|
|
// to prepend an element, one has to create a
|
|
// slice out of that element
|
|
s = append([]string{"zeroth"}, s...)
|
|
\end{gocode}
|
|
|
|
Append is a variadic function, meaning it takes \textit{n} elements. If the slice is of type \mintinline{go}|[]T|,
|
|
the appended elements have to be of type \mintinline{go}|T|.
|
|
|
|
To join two lists, the second list is expanded into
|
|
variadic arguments.
|
|
|
|
More complex operations like removing elements, inserting elements in the middle or finding
|
|
elements in a slice require helper functions, which have also been documented in Go's
|
|
Slice Tricks\autocite{slice-tricks}.
|
|
|
|
\subsection{What is missing from Slices}
|
|
|
|
This quick glance at slices should clarify that, though the runtime characteristics of lists and slices
|
|
can differ, from a usage standpoint, what is possible with lists is also possible with slices.
|
|
|
|
In a typical program written in a functional language, lists take a central role\footnote{Interestingly,
|
|
there is no clear and direct answer as to why they do. One reason may be because they are recursively
|
|
defined and trivially to implement functionally. Further, they are easier to use than arrays, where
|
|
the programmer would need to track the index and bound of the array (imagine keeping track of the
|
|
indices in a recursive function)\autocite{why-lists}}. Because of this, functional languages have
|
|
a number of helper functions like `map', `filter' and `fold'\autocite{haskell-list-funcs} to modify and
|
|
work on lists. These so called `higher order functions' currently do
|
|
not exist in Go and would need to be implemented by the programmer. With no support for polymorphism, a
|
|
different implementation would need to be written for every slice-type that is used. The type \mintinline{go}|[]int|
|
|
(read: a slice of integers) differs from \mintinline{go}|[]string|, which means that a possible
|
|
`map' function would have to be written once to support slices of integers, once to support slices
|
|
of strings, and a combination of these two:
|
|
|
|
\begin{gocode}
|
|
func mapIntToInt(f func(int) int, []int) []int
|
|
func mapIntToString(f func(int) string, []int) []string
|
|
func mapStringToInt(f func(string) int, []string) []int
|
|
func mapStringToString(f func(string) string, []string) []string
|
|
\end{gocode}
|
|
|
|
With 7 base types (eliding the different `int' types like `int8', `uint16`, `int16', etc.), this would
|
|
mean $7^{2} = 49$ map functions just to cover the base types. Counting the different numeric
|
|
types into that equation (totally 19 distinct types\autocite{go-basetypes}), would grow that number to $19^{2} = 361$ functions.
|
|
|
|
Though this code could be generated, it misses user-defined types which would still
|
|
need to be generated separately in a pre-compile step.
|
|
|
|
Another option, instead of having a function per type, would be that `map' takes and returns empty interfaces
|
|
(\mintinline{go}|interface{}|). However, `the empty interface says
|
|
nothing'\autocite{empty-interface}. The declaration of `map' would be:
|
|
\begin{gocode}
|
|
func map(f func(interface{}) interface{}, interface{}) interface{}
|
|
\end{gocode}
|
|
|
|
This function header does not say anything about it's types, which would
|
|
mean that they would need to be checked at runtime and handled gracefully. It
|
|
would also require the caller to do a type assertion after every call. Further,
|
|
a slice of type \mintinline{go}|T| cannot simply be converted or asserted to a slice of another type
|
|
\mintinline{go}|T2|\autocite{go-interface-slice-conv}\autocite{go-interface-slice-conv2}.
|
|
Because of this limitation, a typical usage pattern of map would be:
|
|
\begin{gocode}
|
|
s := []string{"hello", "world"}
|
|
var i []interface{}
|
|
for _, e := range s {
|
|
i = append(i, e)
|
|
}
|
|
// i is now populated and can be used.
|
|
r := map(someFunc, f)
|
|
// to convert it back to []string:
|
|
s = r.([]string)
|
|
\end{gocode}
|
|
|
|
This exemplifies why using the empty interface is not an option. Further, the
|
|
function could not really be type-checked at compile time, as there is no indication
|
|
of which argument's type must be equal.
|
|
|
|
\section{Built-in functions}
|
|
|
|
To mitigate these issues, the most common list-operations (in Go slice-operations) will
|
|
be added as built-ins to the compiler, so that the programmer can use these functions
|
|
on every slice-type without any conversion or code generation being necessary.
|
|
|
|
The language specification defines what built-in functions are and which built-in
|
|
functions should exist:
|
|
\begin{quote}
|
|
Built-in functions are predeclared. They are called like any other function
|
|
but some of them accept a type instead of an expression as the first argument.
|
|
|
|
The built-in functions do not have standard Go types, so they can only appear
|
|
in call expressions; they cannot be used as function values.\autocite{go-spec-builtins}
|
|
\end{quote}
|
|
|
|
For example, the documentation for the built-in \mintinline{go}|append|:
|
|
\begin{gocode}
|
|
// The append built-in function appends elements to the end of a slice.
|
|
// ...
|
|
func append(slice []Type, elems ...Type) []Type
|
|
\end{gocode}
|
|
|
|
The documentation shows that the types supplied to append are not specified upfront.
|
|
Instead, they are resolved and checked during compilation of the program.
|
|
|
|
Thus, in order to have generic list processing functions, these functions need to
|
|
be implemented as built-ins in the compiler.
|
|
|
|
\section{The Go Compiler}
|
|
|
|
The Go programming language is defined by its specification\autocite{go-spec}, and not
|
|
it's implementation. As of Go 1.14, there are two major implementations of that
|
|
specification; Google's self-hosting compiler toolchain `gc', which is written in
|
|
Go, and `gccgo', a front end for GCC, written in C++.
|
|
|
|
When talking about the Go compiler, what's mostly referred to is `gc'\footnote{`gc' stands
|
|
for `go compiler', and not `garbage collection' (which is abbreviated as `GC').}.
|
|
|
|
A famous, although not completely correct story tells about Go being designed
|
|
while a C++ program was compiling\autocite{less-is-more}.
|
|
This is why one of the main goals when designing Go was fast compilation times:
|
|
\begin{quote}
|
|
Finally, working with Go is intended to be fast: it should take at most a few
|
|
seconds to build a large executable on a single computer. To meet these goals
|
|
required addressing a number of linguistic issues: an expressive but lightweight
|
|
type system; concurrency and garbage collection; rigid dependency specification;
|
|
and so on. These cannot be addressed well by libraries or tools; a new language
|
|
was called for.\autocite{go-faq}
|
|
\end{quote}
|
|
|
|
Go has taken some measures to combat slow compilation times. In general, Go's dependency resolution is simpler
|
|
compared to other languages, for example by not allowing circular dependencies.
|
|
Furthermore, compilation is not even attempted if there are unused
|
|
imports or unused declarations of variables, types and functions.
|
|
This leads to less code to compile and in turn shorter compilation times.
|
|
Another reason is that `the `gc' compiler is simpler code compared to most
|
|
recent compilers'\autocite{nuts-compiler}. However, according
|
|
to Rob Pike, one of the creators of Go, Go's compiler is not notably fast, but
|
|
most other compilers are slow:
|
|
|
|
\begin{quote}
|
|
The compiler hasn't even been properly tuned for speed. A truly fast compiler
|
|
would generate the same quality code much faster.\autocite{nuts-compiler}
|
|
\end{quote}
|
|
|
|
The code generation with the `gc' compiler is split into four phases:
|
|
|
|
\subsection{Parsing Go programs}
|
|
|
|
\newglossaryentry{dag}{name=DAG, description={Directed Acyclic Graph}}
|
|
\newglossaryentry{ast}{name=AST,description={Abstract Syntax Tree, an abstract representation of source code as a tree}}
|
|
|
|
The first phase of compilation is parsing Go programs into a syntax tree. This
|
|
is done by tokenising the code (`lexical analysis' - the `lexer'), parsing
|
|
(`syntax analysis' - the `parser') it and then constructing a syntax tree
|
|
(\gls{ast})\footnote{Technically, the syntax tree is a syntax \gls{dag}\autocite{ast-node-dag}}
|
|
for each source file.
|
|
|
|
Go has been designed to be easy to parse and analyse. For example, in contrast
|
|
to many other programming languages,
|
|
Go does not require a symbol table to parse source code.\autocite{go-faq-symbol}.
|
|
|
|
\subsection{Type-checking and AST-transformation}\label{sec:comp-type}
|
|
|
|
The second phase of compilation starts by converting the `syntax' package's
|
|
AST, created in the first phase, to the compiler's AST representation. This
|
|
is due to historical reasons, as gc's AST definition was carried over
|
|
from the C implementation.
|
|
|
|
After the conversion, the AST is type-checked. Within the type-checking, there
|
|
are also some additional steps included like checking for `declared and not used'
|
|
variables and determining whether a function terminates.
|
|
|
|
After type-checking, transformations are applied on the AST. This includes
|
|
eliminating dead code, inlining function calls and escape analysis, to name a few.
|
|
What is also done in the transformation phase is rewriting built-in function
|
|
calls, replacing for example a call to the built-in `append' with the necessary
|
|
AST structure and runtime-calls to implement its functionality.
|
|
|
|
\subsection{SSA}
|
|
|
|
\newglossaryentry{ssa}{name=SSA,description={Single Static Assignment, an intermediate representation between the AST and the compiled binary that simplifies and improves compiler optimisations}}
|
|
In the third phase, the AST is converted to \gls{ssa} form. SSA (Single Static Assignment) is `a
|
|
lower-level intermediate representation with specific properties that make it
|
|
easier to implement optimizations and to eventually generate machine code from
|
|
it'\autocite{compiler-readme}.
|
|
|
|
The conversion consists of multiple `passes' through the SSA that
|
|
apply machine-independent rules to optimise code. These generic
|
|
rewrite rules are applied on every architecture and thus mostly
|
|
concern expressions (for example replacing expressions with constant values and
|
|
optimising multiplications), dead code elimination and removal of unneeded
|
|
nil-checks.
|
|
|
|
\subsection{Generating machine code}
|
|
|
|
Lastly, in the fourth phase of the compilation, machine-specific
|
|
SSA optimisations are applied. These may include:
|
|
\begin{itemize}
|
|
\item Rewriting generic values into their machine-specific variants
|
|
(for example, on amd64, combining load-store operations)
|
|
\item Dead-code elimination
|
|
\item Pointer liveness analysis
|
|
\item Removing unused local variables
|
|
\end{itemize}
|
|
|
|
After generating and optimising the SSA form, the code is passed to the
|
|
assembler, which replaces the so far generic instructions with
|
|
architecture-specific machine code and writes out the final object file\autocite{compiler-readme}.
|