bachelor-thesis/thesis/chapters/32_funcheck.tex

To learn functional programming without a purely functional language, the
developer needs to know which statements are functional and which
would not exist or be possible in a purely functional language.
For this reason, a `functional checker' should be created. It should work
in a similar fashion to linters like `shellcheck', `go vet' or `gosimple'\footnote{
    A list of Go linters can be found in golangci-lint\autocite{golangci-lint}.
}, with different `rules' that are reported upon. This tool will be named `funcheck'.

A set of rules should be compiled to identify common `non-functional' constructs
which should then be reported upon.

The first step is to define a set of rules.

\subsection{Functional Purity}\label{sec:func-purity}

The goal of funcheck is to report every construct that can not be considered
to be purely functional. As there is no agreed upon definition on what
functional purity is\autocite{functional-controversy}, the first step is to
define functional purity for the context of this thesis:

\begin{quote}
    A style of building the structure and elements of computer programs that
    treats all computation as the evaluation of mathematical functions. Purely
    functional programming may also be defined by forbidding changing state and
    mutable data.

    Purely functional programming consists in ensuring that functions, inside
    the functional paradigm, will only depend on their arguments, regardless of
    any global or local state.\autocite{functional-purity-wiki}
\end{quote}

This definition roughly brakes down into two parts; immutability and function
purity. These parts and how they translate to Go will be discussed in the
following chapters.

\subsection{Forbidding mutability and changing state}\label{sec:mutability}

\newglossaryentry{copy-by-value}{name=copy by value, description={Copy by value
    refers to the argument-passing style where the supplied arguments are
    copied. To pass references in Go, the developer needs to use pointers
instead}}

Immutability refers to objects --- variables --- that are unchangeable. This means
that their state cannot be modified after it's creation and first assignment.

Many programming languages provide features for making variables immutable.

In Rust for example, variables are immutable by default. To explicitly declare
a mutable variable, the \mintinline{rust}|mut| keyword has to be used: \mintinline{rust}|let mut x = 5;|.

Java, in contrast, uses the \mintinline{java}|final| keyword:
\begin{quote}
    A final variable can only be initialized once\autocite{final-java}
\end{quote}
`Only be initialized once' means that it cannot be reassigned later, effectively
making that variable immutable.
A caveat with Java's \mintinline{java}|final| keyword is that the immutability
is only tied to the reference of that object, not
it's contents (one may still change a field of an immutable object).

C and C++ have two features to achieve immutability, the \mintinline{c}|#define|
preprocessor and the \mintinline{c}|const| keyword.
The \mintinline{c}|#define| directive does not declare variables, but rather
works in a similar way to string replacement at compile time. These directives
can also be expressions. For example, this is a valid define statement:
\begin{ccode}
#define AGE (20/2)
\end{ccode}

However, because \mintinline{c}|#define| is just text substitution, these
identifiers are untyped.

The \mintinline{c}|const| qualifier specifies that a variable's value
will not be changed, making it immutable. This is effectively the equivalent
to Java's \mintinline{java}|final|.

Go has the \mintinline{go}|const| keyword too, and similar to C, constants can only be
characters, strings, booleans or numeric values\footnote{In contrast to C, Go does
have a boolean type}. A complex type --- a struct, interface or function --- cannot be constant.

This means that the programmer cannot write
\mintinline{go}|const x = MyStruct{A: a, B: b}| to make a struct immutable.

A solution to making variables
immutable is to explicitly not allow any mutations in a program, effectively
disallowing all reassignments.
To do this in Go, the simplest solution is
to disallow the assignment operator \mintinline{go}|=|, and only
allow it in combination with a declaration (\mintinline{go}|:=|).

This solution also has the side-effect that it makes it impossible to mutate
existing, complex data structures like maps and slices\footnote{By not
    allowing assignments, elements can not be updated by
\mintinline{go}|s[0] = "zero"| / \mintinline{go}|m["key"] = "value"|.}.

There are additional operators that work in a similar way to the assignment
operator, for example \mintinline{go}|++|, \mintinline{go}|--| and
\mintinline{go}|+=|. These are all `hidden' assignments and will need to
be reported upon too.

Go being a \gls{copy-by-value} language, mutating existing variables
across functions works by passing pointers to these variables\footnote{
An example for this can be found in Appendix~\ref{appendix:mutation}.}.
When removing the regular assignment operators, the usage of pointers becomes
second nature, as it cannot be updated either way.
Technically, the only advantage that pointers have left, is that less memory has
to be copied\footnote{The pointer still has to be passed, and thus copied, to the
called function} if a variable's type is big enough\footnote{For example, it is
	cheaper to copy an \mintinline{go}|int32| than to take address of it, copy that
(64 bit, on most systems) address and dereferencing it again}.
This optimisation is left to the developer.

In functional languages in contrast, this optimisation is usually left to the compiler
and there are no pointers at all in the language\footnote{Except for things like the `foreign
function interface' (FFI) in Haskell, which allows to call C functions from Haskell code}.

A feature not discussed so far is shadowing. Shadowing a variable in a different
scope is still possible by redeclaring it. As expected, even with the above
mentioned limitations, the rules for shadowing variables do not change. This
can be seen in Appendix~\ref{appendix:shadowing}.

\subsection{Pure functions}

A pure function is a function where the return value is only dependent on
the arguments. With the same arguments, the function should always return
the same values. The evaluation of the function is also not allowed to have
any `side effects', meaning that it should not mutate non-local variables or
references\footnote{This also includes `local static' variables, but Go does
not have a way to make variables `static'.}.

As discussed in the last Chapter~\ref{sec:mutability}, if reassignments
are not allowed, mutating global variables is not possible either.

If global state cannot be changed, it also cannot influence a function's return values.

As such, forbidding reassignments is an extremely simple solution to
ensure functional purity within the program. However, there is one
important aspect that the solution does not solve: interacting with the outside world.

\subsection{IO}
Network and file access, time, randomness, user input and sensors are all examples
of IO, things that interact with state outside of the program.

Input and Output in a program --- through whether channel it may be --- is
impure by design. Due to this, the Haskell language authors faced a difficult
challenge; how to add impure functions to an otherwise completely pure
language.

In a pure language, functions only depend on their arguments, have
no side effects and return a value computed from these arguments. Because
of this, the execution order of functions is not relevant, a function
can be executed whenever its return value is needed.
Due to this, the compiler can optimise the code, calling the functions
in a specific (or in no specific) order or omitting calls if their return
values are unused.

However, impure IO functions introduce some difficulties.

For example, one would expect that it is possible to write a trivial `writeFile'
function that takes a filename and a string with the contents for that file with
the function definition roughly being:

\begin{haskellcode}
writeFile :: String -> String
\end{haskellcode}
Supposed that this function could be called with
\mintinline{haskell}|writeFile "hello.txt" "Hello World"|. The function
does not return anything, or its return value, for example the number
of bytes that have been written to the file, is discarded on the call site.

As there is no dependency on that function anymore, the compiler assumes that
this function does not need to be executed at all --- it still expects all
functions to be pure (even if this is not the case in this example).
Being able to follow the purity guarantees, the compiler emits the call
to the writeFile function and the file never gets written.

Similarly, for getting user input:
\begin{haskellcode}
getChar :: Char
\end{haskellcode}
If this function is called multiple times, the compiler may reorder the
execution of these calls, again, because it expects the functions to be
pure, so the execution order should not matter:
\begin{haskellcode}
get2Chars = [getChar, getChar]
\end{haskellcode}
However, in this example, execution order is extremely important.

To solve this problem, Haskell introduced the IO monad, in which
all functions are wrapped with a `RealWorld' argument. So the
\mintinline{haskell}|getChar| example would be written as\footnote{
	This is not the actual function header for getChar, only an illustration.
	The actual function header of getChar is \mintinline{haskell}|getChar :: IO Char|.
}:

\begin{haskellcode}
getChar :: RealWorld -> (Char, RealWorld)
\end{haskellcode}
This can be read as `take the current RealWorld, do something with it,
and return a Char and a (possibly changed) RealWorld'\autocite{haskell-io}.

With this solution, the compiler cannot reorder or omit IO actions, ensuring
the correct execution of the program.

The IO monad is a sophisticated solution to work around the purity guarantees and
enables Haskellers to use impure functions in an otherwise completely pure language.
Put another way: the IO monad exists because of compiler optimisations, which in turn are
based on the assumption that everything is pure.

In Go, the compiler does not and can not make this assumption. There also
is no `pure' keyword to mark a function as such. For that reason, the
compiler cannot optimise as aggressively, and the whole IO problem does
not exist. However, in the context of this thesis, interacting with the outside world ---
using IO-functions --- means that functional purity cannot be guaranteed anymore.
Haskell's solution to this issue is relatively complex, and while it may be applicable
to Go, it would require massively rewriting Go's standard library. Furthermore,
the IO monad does not make impure functions magically pure. Rather, it
clearly marks functions as impure.

Another issue, apart from the amount of work that would be needed, is that
the goal is to provide an easy introduction to functional programming.
IO, as one may see in these examples, is one of the hardest topics.

For those reasons, although it may violate the functional purity requirements,
the underlying issue with IO will be ignored. This means that the programmer
will still be able to use IO functions as in regular Go programs.

\subsection{Assignments in Go}

As described in Section~\ref{sec:mutability}, the only check that needs
to be done is that there are no reassignments in the code. There are a few
exceptions to this rule, which will be covered later. First, it is important to have
a few examples and test cases to be clear on what should be reported.

To declare variables in Go, the \mintinline{go}|var| keyword is used.
The var keyword needs to be followed by one or more identifiers and maybe
types or the initial value. These are all valid declarations:

\begin{listing}
	\begin{gocode}
var i int
var U, V, W float64
var k = 1
var x, y float32 = -1, -2
	\end{gocode}
	\caption{Go Variable Declarations}
\end{listing}
What can be seen in this example is that the type of the variable can be
deducted automatically\footnote{The compiler will infer the type at compile
time. That operation is always successful, although it may not be what
the programmer desires. For example, \mintinline{go}|var x = 5| will
always result in \mintinline{go}|x| to be of type \mintinline{go}|int|.},
or be specified explicitly.

However, the notation \mintinline{go}|var k = 1| is seldomly seen. This
is due to the fact that there is a second way to declare and initialise
variables to a specific value, the `short variable declaration' syntax:

\begin{gocode}
k := 1
\end{gocode}
\begin{quote}
It is shorthand for a regular variable declaration with initializer expressions but no types:

\begin{bnfcode}
"var" IdentifierList = ExpressionList .
\end{bnfcode}
\autocite{short-hand-decl}
\end{quote}

In practice, the short variable declaration is used more often due to the
fact that there is no need to specify the type of the variable manually and
is shorter than its counterpart \mintinline{go}|var x = y|.

Go also has increment, decrement and various other operators to reassign
variables. Most of the operators that take a `left' and `right' expression
also have a short-hand notation to reassign the value back to the variable,
for example\footnote{This is a non-exhaustive list of operators. A complete listing
of operators can be found in the Go language spec\autocite{spec-operators}.}:


\begin{listing}
\begin{gocode}
var x = 5
x += 5 // equal to x = x + 5
x %= 3 // equal to x = x % 3
x <<= 2 // equal to x = x << 2 (bit-shift)
\end{gocode}
	\caption{Go Assignment Operators}
\end{listing}
All these operators are reassigning values and should be reported.

However, as mentioned in the beginning of this chapter, there are a few exceptions
to this rule:

\subsubsection{Function Assignments}\label{sec:func-reassign}

A variable declaration brings an identifier into scope. The Go Language Specification
defines this scope according to the following rule:
\begin{quote}
    The scope of a constant or variable identifier declared inside a function begins
    at the end of the ConstSpec or VarSpec (ShortVarDecl for short variable
    declarations) and ends at the end of the innermost containing block.
\autocite{spec-scope}\end{quote}

This definition holds an important caveat for function definitions:
\begin{quote}
    The scope [...] begins \textbf{at the end} of the ConstSpec or VarSpec [...].
\end{quote}

This means that when defining a function, the functions identifier is not
in scope within that function:

\begin{listing}
\begin{gocode}
func X() {
	f := func() {
		f() // <- 'f' is not in scope yet.
	}
}
\end{gocode}
	\caption{Go scoping issue with recursive functions}
\end{listing}
In the above example, the function \mintinline{go}|f| cannot be called recursively. This is due
to the fact that the identifier \mintinline{go}|f| is not in scope at \mintinline{go}|f|'s definition.
Instead, it only enters scope after the closing brace of function \mintinline{go}|f|.

This cannot be changed within the compiler without touching a lot of the scoping
rules and changing these scoping rules may have unintended side effects. To exemplify,
a naive solution to the issue would be to bring the identifier into scope right after the
assignment operator (\mintinline{go}|=|). However, this would introduce further
edge cases, as for example \mintinline{go}|x := x|, and extra measures would have
to be taken to make such constructs illegal again.

However, there is a simple, although verbose solution to this problem: The function
has to be declared in advance:

\begin{listing}
	\begin{gocode}
func X() {
	var f func() // f enters scope after this line
	f = func() {
		f() // this is allowed, as f is in scope
	}
}
\end{gocode}
	\caption{Fixing the scope issue on recursive functions}
\end{listing}
This exception will need to be allowed by the assignment checker, as recursive functions
are one of the building blocks in functional programming.

\subsubsection{Multiple Assignments}\label{sec:multi-assign}

\newglossaryentry{ebnf}{name=EBNF, description={Extended Backus-Naur Form, an extended version of the `Backus-Naur
Form, a notation technique to describe programming language syntax}}

As discussed at the beginning of this Chapter, the short variable declaration brings
a variable into scope and assigns it to a given expression (value). It is also
possible to declare multiple variables at once, as can be seen in the \gls{ebnf}
notation:
\begin{listing}
	\begin{bnfcode}
ShortVarDecl = IdentifierList ":=" ExpressionList .
	\end{bnfcode}
\end{listing}
This clearly shows that both left- and right-hand side take a list, meaning one or more
elements, making a statement like \mintinline{go}|x, y := 1, 2| valid\footnote{This also
	holds true for the \mintinline{go}|var| keyword}.

However, there is an important note to be found in the language specification:

\begin{quote}
Unlike regular variable declarations, a short variable declaration may redeclare
variables provided they were originally declared earlier in the same block (or
the parameter lists if the block is the function body) with the same type, and
at least one of the non-blank\footnote{The blank identifier (\mintinline{go}|_|)
    discards a value; \mintinline{go}|x, _ := f()|} variables is new. As a consequence, redeclaration
can only appear in a multi-variable short declaration. Redeclaration does not
introduce a new variable; it \textbf{just assigns a new value to the original}\autocite{short-hand-decl}.
    \begin{gocode}
field1, offset := nextField(str, 0)
field2, offset := nextField(str, offset)  // redeclares offset
    \end{gocode}
\end{quote}

This should not be allowed, as the programmer can introduce mutation by redeclaring
already existing variables in a short variable declaration.

\subsubsection{For Loops}

Although the aforementioned rules cover almost every case of reassignment, they miss
one of the most important parts: For loops.

In Go, there is only one looping construct, the \mintinline{go}|for| loop. However, it can
be used in roughly four different ways:

The infinite loop:
\begin{gocode}
for {
	// ...
}
\end{gocode}
The while loop:
\begin{gocode}
for x == true { // boolean expression
	// ...
}
\end{gocode}
The regular C-style for loop:
\begin{gocode}
for x := 0; x < 10; x++ {
	// ...
}
\end{gocode}
And the range loop, which allows to range (iterate) over slices, maps and channels\footnote{Channels
have not been discussed in this thesis, but are an important building block in Go's concurrency
features. They work similarly to Unix Pipes (`|'), in that they allow to send data in and receive
it on the other `side'. A \mintinline{go}|range| receives as long as the channel is not closed.}:
\begin{gocode}
for idx, elem := range []int{1,8,5,4} {
	// ...
}
\end{gocode}
Range returns two values, both of which can be omitted\footnote{The values can be omitted
    by using the blank identifier `\_'. Skipping the index is thus
    \mintinline{go}|_, elem := range s|, while skipping the value with
\mintinline{go}|i, _ := range s| which is equal to just \mintinline{go}|i := range s|.}. When
ranging over a slice, the index and a copy of the element at that index is returned. For
maps, the key and a copy of the corresponding value is returned.

Except of the C-style for loop, the loops do not have a reassignment statement when
converted to the AST. For this reason, they would not be detected by the previously defined rule and thus
the programmer could use them.
However, issues emerge when having a closer look at the loop constructs.

The \mintinline{go}|range| loop needs to keep track of the element, and it does so (internally)
by using a pointer. This results in a (internal) reassignment.

The regular C-style for loop and the `while' loop do not work at all without reassignments.
Although in the while-loop the reassignment happens at another place, most probably within the loop,
it is easier for the user if a report on the loop construct is generated instead.

The infinite loop does not contain any reassignments and is thereby be legal by the
definition of the reassignment rule. Because of this, it should not be reported.

All other for-loops need to be reported as they contain internal reassignments.