You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
447 lines
20 KiB
447 lines
20 KiB
5 years ago
|
To learn functional programming without a purely functional language, the
|
||
|
developer needs to know which statements are functional and which
|
||
|
would not exist or be possible in a purely functional language.
|
||
|
For this reason, a `functional checker' should be created. It should work
|
||
|
in a similar fashion to linters like `shellcheck', `go vet' or `gosimple'\footnote{
|
||
|
A list of Go linters can be found in golangci-lint\autocite{golangci-lint}.
|
||
|
}, with different `rules' that are reported upon. This tool will be named `funcheck'.
|
||
|
|
||
|
A set of rules should be compiled to identify common `non-functional' constructs
|
||
|
which should then be reported upon.
|
||
|
|
||
|
The first step is to define a set of rules.
|
||
|
|
||
|
\subsection{Functional Purity}\label{sec:func-purity}
|
||
|
|
||
|
The goal of funcheck is to report every construct that can not be considered
|
||
|
to be purely functional. As there is no agreed upon definition on what
|
||
|
functional purity is\autocite{functional-controversy}, the first step is to
|
||
|
define functional purity for the context of this thesis:
|
||
|
|
||
|
\begin{quote}
|
||
|
A style of building the structure and elements of computer programs that
|
||
|
treats all computation as the evaluation of mathematical functions. Purely
|
||
|
functional programming may also be defined by forbidding changing state and
|
||
|
mutable data.
|
||
|
|
||
|
Purely functional programming consists in ensuring that functions, inside
|
||
|
the functional paradigm, will only depend on their arguments, regardless of
|
||
|
any global or local state.\autocite{functional-purity-wiki}
|
||
|
\end{quote}
|
||
|
|
||
|
This definition roughly brakes down into two parts; immutability and function
|
||
|
purity. These parts and how they translate to Go will be discussed in the
|
||
|
following chapters.
|
||
|
|
||
|
\subsection{Forbidding mutability and changing state}\label{sec:mutability}
|
||
|
|
||
|
\newglossaryentry{copy-by-value}{name=copy by value, description={Copy by value
|
||
|
refers to the argument-passing style where the supplied arguments are
|
||
|
copied. To pass references in Go, the developer needs to use pointers
|
||
|
instead}}
|
||
|
|
||
|
Immutability refers to objects --- variables --- that are unchangeable. This means
|
||
|
that their state cannot be modified after it's creation and first assignment.
|
||
|
|
||
|
Many programming languages provide features for making variables immutable.
|
||
|
|
||
|
In Rust for example, variables are immutable by default. To explicitly declare
|
||
|
a mutable variable, the \mintinline{rust}|mut| keyword has to be used: \mintinline{rust}|let mut x = 5;|.
|
||
|
|
||
|
Java, in contrast, uses the \mintinline{java}|final| keyword:
|
||
|
\begin{quote}
|
||
|
A final variable can only be initialized once\autocite{final-java}
|
||
|
\end{quote}
|
||
|
`Only be initialized once' means that it cannot be reassigned later, effectively
|
||
|
making that variable immutable.
|
||
|
A caveat with Java's \mintinline{java}|final| keyword is that the immutability
|
||
|
is only tied to the reference of that object, not
|
||
|
it's contents (one may still change a field of an immutable object).
|
||
|
|
||
|
C and C++ have two features to achieve immutability, the \mintinline{c}|#define|
|
||
|
preprocessor and the \mintinline{c}|const| keyword.
|
||
|
The \mintinline{c}|#define| directive does not declare variables, but rather
|
||
|
works in a similar way to string replacement at compile time. These directives
|
||
|
can also be expressions. For example, this is a valid define statement:
|
||
|
\begin{ccode}
|
||
|
#define AGE (20/2)
|
||
|
\end{ccode}
|
||
|
|
||
|
However, because \mintinline{c}|#define| is just text substitution, these
|
||
|
identifiers are untyped.
|
||
|
|
||
|
The \mintinline{c}|const| qualifier specifies that a variable's value
|
||
|
will not be changed, making it immutable. This is effectively the equivalent
|
||
|
to Java's \mintinline{java}|final|.
|
||
|
|
||
|
Go has the \mintinline{go}|const| keyword too, and similar to C, constants can only be
|
||
|
characters, strings, booleans or numeric values\footnote{In contrast to C, Go does
|
||
|
have a boolean type}. A complex type --- a struct, interface or function --- cannot be constant.
|
||
|
|
||
|
This means that the programmer cannot write
|
||
|
\mintinline{go}|const x = MyStruct{A: a, B: b}| to make a struct immutable.
|
||
|
|
||
|
A solution to making variables
|
||
|
immutable is to explicitly not allow any mutations in a program, effectively
|
||
|
disallowing all reassignments.
|
||
|
To do this in Go, the simplest solution is
|
||
|
to disallow the assignment operator \mintinline{go}|=|, and only
|
||
|
allow it in combination with a declaration (\mintinline{go}|:=|).
|
||
|
|
||
|
This solution also has the side-effect that it makes it impossible to mutate
|
||
|
existing, complex data structures like maps and slices\footnote{By not
|
||
|
allowing assignments, elements can not be updated by
|
||
|
\mintinline{go}|s[0] = "zero"| / \mintinline{go}|m["key"] = "value"|.}.
|
||
|
|
||
|
There are additional operators that work in a similar way to the assignment
|
||
|
operator, for example \mintinline{go}|++|, \mintinline{go}|--| and
|
||
|
\mintinline{go}|+=|. These are all `hidden' assignments and will need to
|
||
|
be reported upon too.
|
||
|
|
||
|
Go being a \gls{copy-by-value} language, mutating existing variables
|
||
|
across functions works by passing pointers to these variables\footnote{
|
||
|
An example for this can be found in Appendix~\ref{appendix:mutation}.}.
|
||
|
When removing the regular assignment operators, the usage of pointers becomes
|
||
|
second nature, as it cannot be updated either way.
|
||
|
Technically, the only advantage that pointers have left, is that less memory has
|
||
|
to be copied\footnote{The pointer still has to be passed, and thus copied, to the
|
||
|
called function} if a variable's type is big enough\footnote{For example, it is
|
||
|
cheaper to copy an \mintinline{go}|int32| than to take address of it, copy that
|
||
|
(64 bit, on most systems) address and dereferencing it again}.
|
||
|
This optimisation is left to the developer.
|
||
|
|
||
|
In functional languages in contrast, this optimisation is usually left to the compiler
|
||
|
and there are no pointers at all in the language\footnote{Except for things like the `foreign
|
||
|
function interface' (FFI) in Haskell, which allows to call C functions from Haskell code}.
|
||
|
|
||
|
A feature not discussed so far is shadowing. Shadowing a variable in a different
|
||
|
scope is still possible by redeclaring it. As expected, even with the above
|
||
|
mentioned limitations, the rules for shadowing variables do not change. This
|
||
|
can be seen in Appendix~\ref{appendix:shadowing}.
|
||
|
|
||
|
\subsection{Pure functions}
|
||
|
|
||
|
A pure function is a function where the return value is only dependent on
|
||
|
the arguments. With the same arguments, the function should always return
|
||
|
the same values. The evaluation of the function is also not allowed to have
|
||
|
any `side effects', meaning that it should not mutate non-local variables or
|
||
|
references\footnote{This also includes `local static' variables, but Go does
|
||
|
not have a way to make variables `static'.}.
|
||
|
|
||
|
As discussed in the last Chapter~\ref{sec:mutability}, if reassignments
|
||
|
are not allowed, mutating global variables is not possible either.
|
||
|
|
||
|
If global state cannot be changed, it also cannot influence a function's return values.
|
||
|
|
||
|
As such, forbidding reassignments is an extremely simple solution to
|
||
|
ensure functional purity within the program. However, there is one
|
||
|
important aspect that the solution does not solve: interacting with the outside world.
|
||
|
|
||
|
\subsection{IO}
|
||
|
Network and file access, time, randomness, user input and sensors are all examples
|
||
|
of IO, things that interact with state outside of the program.
|
||
|
|
||
|
Input and Output in a program --- through whether channel it may be --- is
|
||
|
impure by design. Due to this, the Haskell language authors faced a difficult
|
||
|
challenge; how to add impure functions to an otherwise completely pure
|
||
|
language.
|
||
|
|
||
|
In a pure language, functions only depend on their arguments, have
|
||
|
no side effects and return a value computed from these arguments. Because
|
||
|
of this, the execution order of functions is not relevant, a function
|
||
|
can be executed whenever its return value is needed.
|
||
|
Due to this, the compiler can optimise the code, calling the functions
|
||
|
in a specific (or in no specific) order or omitting calls if their return
|
||
|
values are unused.
|
||
|
|
||
|
However, impure IO functions introduce some difficulties.
|
||
|
|
||
|
For example, one would expect that it is possible to write a trivial `writeFile'
|
||
|
function that takes a filename and a string with the contents for that file with
|
||
|
the function definition roughly being:
|
||
|
|
||
|
\begin{haskellcode}
|
||
|
writeFile :: String -> String
|
||
|
\end{haskellcode}
|
||
|
Supposed that this function could be called with
|
||
|
\mintinline{haskell}|writeFile "hello.txt" "Hello World"|. The function
|
||
|
does not return anything, or its return value, for example the number
|
||
|
of bytes that have been written to the file, is discarded on the call site.
|
||
|
|
||
|
As there is no dependency on that function anymore, the compiler assumes that
|
||
|
this function does not need to be executed at all --- it still expects all
|
||
|
functions to be pure (even if this is not the case in this example).
|
||
|
Being able to follow the purity guarantees, the compiler emits the call
|
||
|
to the writeFile function and the file never gets written.
|
||
|
|
||
|
Similarly, for getting user input:
|
||
|
\begin{haskellcode}
|
||
|
getChar :: Char
|
||
|
\end{haskellcode}
|
||
|
If this function is called multiple times, the compiler may reorder the
|
||
|
execution of these calls, again, because it expects the functions to be
|
||
|
pure, so the execution order should not matter:
|
||
|
\begin{haskellcode}
|
||
|
get2Chars = [getChar, getChar]
|
||
|
\end{haskellcode}
|
||
|
However, in this example, execution order is extremely important.
|
||
|
|
||
|
To solve this problem, Haskell introduced the IO monad, in which
|
||
|
all functions are wrapped with a `RealWorld' argument. So the
|
||
|
\mintinline{haskell}|getChar| example would be written as\footnote{
|
||
|
This is not the actual function header for getChar, only an illustration.
|
||
|
The actual function header of getChar is \mintinline{haskell}|getChar :: IO Char|.
|
||
|
}:
|
||
|
|
||
|
\begin{haskellcode}
|
||
|
getChar :: RealWorld -> (Char, RealWorld)
|
||
|
\end{haskellcode}
|
||
|
This can be read as `take the current RealWorld, do something with it,
|
||
|
and return a Char and a (possibly changed) RealWorld'\autocite{haskell-io}.
|
||
|
|
||
|
With this solution, the compiler cannot reorder or omit IO actions, ensuring
|
||
|
the correct execution of the program.
|
||
|
|
||
|
The IO monad is a sophisticated solution to work around the purity guarantees and
|
||
|
enables Haskellers to use impure functions in an otherwise completely pure language.
|
||
|
Put another way: the IO monad exists because of compiler optimisations, which in turn are
|
||
|
based on the assumption that everything is pure.
|
||
|
|
||
|
In Go, the compiler does not and can not make this assumption. There also
|
||
|
is no `pure' keyword to mark a function as such. For that reason, the
|
||
|
compiler cannot optimise as aggressively, and the whole IO problem does
|
||
|
not exist. However, in the context of this thesis, interacting with the outside world ---
|
||
|
using IO-functions --- means that functional purity cannot be guaranteed anymore.
|
||
|
Haskell's solution to this issue is relatively complex, and while it may be applicable
|
||
|
to Go, it would require massively rewriting Go's standard library. Furthermore,
|
||
|
the IO monad does not make impure functions magically pure. Rather, it
|
||
|
clearly marks functions as impure.
|
||
|
|
||
|
Another issue, apart from the amount of work that would be needed, is that
|
||
|
the goal is to provide an easy introduction to functional programming.
|
||
|
IO, as one may see in these examples, is one of the hardest topics.
|
||
|
|
||
|
For those reasons, although it may violate the functional purity requirements,
|
||
|
the underlying issue with IO will be ignored. This means that the programmer
|
||
|
will still be able to use IO functions as in regular Go programs.
|
||
|
|
||
|
\subsection{Assignments in Go}
|
||
|
|
||
|
As described in Section~\ref{sec:mutability}, the only check that needs
|
||
|
to be done is that there are no reassignments in the code. There are a few
|
||
|
exceptions to this rule, which will be covered later. First, it is important to have
|
||
|
a few examples and test cases to be clear on what should be reported.
|
||
|
|
||
|
To declare variables in Go, the \mintinline{go}|var| keyword is used.
|
||
|
The var keyword needs to be followed by one or more identifiers and maybe
|
||
|
types or the initial value. These are all valid declarations:
|
||
|
|
||
|
\begin{listing}
|
||
|
\begin{gocode}
|
||
|
var i int
|
||
|
var U, V, W float64
|
||
|
var k = 1
|
||
|
var x, y float32 = -1, -2
|
||
|
\end{gocode}
|
||
|
\caption{Go Variable Declarations}
|
||
|
\end{listing}
|
||
|
What can be seen in this example is that the type of the variable can be
|
||
|
deducted automatically\footnote{The compiler will infer the type at compile
|
||
|
time. That operation is always successful, although it may not be what
|
||
|
the programmer desires. For example, \mintinline{go}|var x = 5| will
|
||
|
always result in \mintinline{go}|x| to be of type \mintinline{go}|int|.},
|
||
|
or be specified explicitly.
|
||
|
|
||
|
However, the notation \mintinline{go}|var k = 1| is seldomly seen. This
|
||
|
is due to the fact that there is a second way to declare and initialise
|
||
|
variables to a specific value, the `short variable declaration' syntax:
|
||
|
|
||
|
\begin{gocode}
|
||
|
k := 1
|
||
|
\end{gocode}
|
||
|
\begin{quote}
|
||
|
It is shorthand for a regular variable declaration with initializer expressions but no types:
|
||
|
|
||
|
\begin{bnfcode}
|
||
|
"var" IdentifierList = ExpressionList .
|
||
|
\end{bnfcode}
|
||
|
\autocite{short-hand-decl}
|
||
|
\end{quote}
|
||
|
|
||
|
In practice, the short variable declaration is used more often due to the
|
||
|
fact that there is no need to specify the type of the variable manually and
|
||
|
is shorter than its counterpart \mintinline{go}|var x = y|.
|
||
|
|
||
|
Go also has increment, decrement and various other operators to reassign
|
||
|
variables. Most of the operators that take a `left' and `right' expression
|
||
|
also have a short-hand notation to reassign the value back to the variable,
|
||
|
for example\footnote{This is a non-exhaustive list of operators. A complete listing
|
||
|
of operators can be found in the Go language spec\autocite{spec-operators}.}:
|
||
|
|
||
|
|
||
|
\begin{listing}
|
||
|
\begin{gocode}
|
||
|
var x = 5
|
||
|
x += 5 // equal to x = x + 5
|
||
|
x %= 3 // equal to x = x % 3
|
||
|
x <<= 2 // equal to x = x << 2 (bit-shift)
|
||
|
\end{gocode}
|
||
|
\caption{Go Assignment Operators}
|
||
|
\end{listing}
|
||
|
All these operators are reassigning values and should be reported.
|
||
|
|
||
|
However, as mentioned in the beginning of this chapter, there are a few exceptions
|
||
|
to this rule:
|
||
|
|
||
|
\subsubsection{Function Assignments}\label{sec:func-reassign}
|
||
|
|
||
|
A variable declaration brings an identifier into scope. The Go Language Specification
|
||
|
defines this scope according to the following rule:
|
||
|
\begin{quote}
|
||
|
The scope of a constant or variable identifier declared inside a function begins
|
||
|
at the end of the ConstSpec or VarSpec (ShortVarDecl for short variable
|
||
|
declarations) and ends at the end of the innermost containing block.
|
||
|
\autocite{spec-scope}\end{quote}
|
||
|
|
||
|
This definition holds an important caveat for function definitions:
|
||
|
\begin{quote}
|
||
|
The scope [...] begins \textbf{at the end} of the ConstSpec or VarSpec [...].
|
||
|
\end{quote}
|
||
|
|
||
|
This means that when defining a function, the functions identifier is not
|
||
|
in scope within that function:
|
||
|
|
||
|
\begin{listing}
|
||
|
\begin{gocode}
|
||
|
func X() {
|
||
|
f := func() {
|
||
|
f() // <- 'f' is not in scope yet.
|
||
|
}
|
||
|
}
|
||
|
\end{gocode}
|
||
|
\caption{Go scoping issue with recursive functions}
|
||
|
\end{listing}
|
||
|
In the above example, the function \mintinline{go}|f| cannot be called recursively. This is due
|
||
|
to the fact that the identifier \mintinline{go}|f| is not in scope at \mintinline{go}|f|'s definition.
|
||
|
Instead, it only enters scope after the closing brace of function \mintinline{go}|f|.
|
||
|
|
||
|
This cannot be changed within the compiler without touching a lot of the scoping
|
||
|
rules and changing these scoping rules may have unintended side effects. To exemplify,
|
||
|
a naive solution to the issue would be to bring the identifier into scope right after the
|
||
|
assignment operator (\mintinline{go}|=|). However, this would introduce further
|
||
|
edge cases, as for example \mintinline{go}|x := x|, and extra measures would have
|
||
|
to be taken to make such constructs illegal again.
|
||
|
|
||
|
However, there is a simple, although verbose solution to this problem: The function
|
||
|
has to be declared in advance:
|
||
|
|
||
|
\begin{listing}
|
||
|
\begin{gocode}
|
||
|
func X() {
|
||
|
var f func() // f enters scope after this line
|
||
|
f = func() {
|
||
|
f() // this is allowed, as f is in scope
|
||
|
}
|
||
|
}
|
||
|
\end{gocode}
|
||
|
\caption{Fixing the scope issue on recursive functions}
|
||
|
\end{listing}
|
||
|
This exception will need to be allowed by the assignment checker, as recursive functions
|
||
|
are one of the building blocks in functional programming.
|
||
|
|
||
|
\subsubsection{Multiple Assignments}\label{sec:multi-assign}
|
||
|
|
||
|
\newglossaryentry{ebnf}{name=EBNF, description={Extended Backus-Naur Form, an extended version of the `Backus-Naur
|
||
|
Form, a notation technique to describe programming language syntax}}
|
||
|
|
||
|
As discussed at the beginning of this Chapter, the short variable declaration brings
|
||
|
a variable into scope and assigns it to a given expression (value). It is also
|
||
|
possible to declare multiple variables at once, as can be seen in the \gls{ebnf}
|
||
|
notation:
|
||
|
\begin{listing}
|
||
|
\begin{bnfcode}
|
||
|
ShortVarDecl = IdentifierList ":=" ExpressionList .
|
||
|
\end{bnfcode}
|
||
|
\end{listing}
|
||
|
This clearly shows that both left- and right-hand side take a list, meaning one or more
|
||
|
elements, making a statement like \mintinline{go}|x, y := 1, 2| valid\footnote{This also
|
||
|
holds true for the \mintinline{go}|var| keyword}.
|
||
|
|
||
|
However, there is an important note to be found in the language specification:
|
||
|
|
||
|
\begin{quote}
|
||
|
Unlike regular variable declarations, a short variable declaration may redeclare
|
||
|
variables provided they were originally declared earlier in the same block (or
|
||
|
the parameter lists if the block is the function body) with the same type, and
|
||
|
at least one of the non-blank\footnote{The blank identifier (\mintinline{go}|_|)
|
||
|
discards a value; \mintinline{go}|x, _ := f()|} variables is new. As a consequence, redeclaration
|
||
|
can only appear in a multi-variable short declaration. Redeclaration does not
|
||
|
introduce a new variable; it \textbf{just assigns a new value to the original}\autocite{short-hand-decl}.
|
||
|
\begin{gocode}
|
||
|
field1, offset := nextField(str, 0)
|
||
|
field2, offset := nextField(str, offset) // redeclares offset
|
||
|
\end{gocode}
|
||
|
\end{quote}
|
||
|
|
||
|
This should not be allowed, as the programmer can introduce mutation by redeclaring
|
||
|
already existing variables in a short variable declaration.
|
||
|
|
||
|
\subsubsection{For Loops}
|
||
|
|
||
|
Although the aforementioned rules cover almost every case of reassignment, they miss
|
||
|
one of the most important parts: For loops.
|
||
|
|
||
|
In Go, there is only one looping construct, the \mintinline{go}|for| loop. However, it can
|
||
|
be used in roughly four different ways:
|
||
|
|
||
|
The infinite loop:
|
||
|
\begin{gocode}
|
||
|
for {
|
||
|
// ...
|
||
|
}
|
||
|
\end{gocode}
|
||
|
The while loop:
|
||
|
\begin{gocode}
|
||
|
for x == true { // boolean expression
|
||
|
// ...
|
||
|
}
|
||
|
\end{gocode}
|
||
|
The regular C-style for loop:
|
||
|
\begin{gocode}
|
||
|
for x := 0; x < 10; x++ {
|
||
|
// ...
|
||
|
}
|
||
|
\end{gocode}
|
||
|
And the range loop, which allows to range (iterate) over slices, maps and channels\footnote{Channels
|
||
|
have not been discussed in this thesis, but are an important building block in Go's concurrency
|
||
|
features. They work similarly to Unix Pipes (`|'), in that they allow to send data in and receive
|
||
|
it on the other `side'. A \mintinline{go}|range| receives as long as the channel is not closed.}:
|
||
|
\begin{gocode}
|
||
|
for idx, elem := range []int{1,8,5,4} {
|
||
|
// ...
|
||
|
}
|
||
|
\end{gocode}
|
||
|
Range returns two values, both of which can be omitted\footnote{The values can be omitted
|
||
|
by using the blank identifier `\_'. Skipping the index is thus
|
||
|
\mintinline{go}|_, elem := range s|, while skipping the value with
|
||
|
\mintinline{go}|i, _ := range s| which is equal to just \mintinline{go}|i := range s|.}. When
|
||
|
ranging over a slice, the index and a copy of the element at that index is returned. For
|
||
|
maps, the key and a copy of the corresponding value is returned.
|
||
|
|
||
|
Except of the C-style for loop, the loops do not have a reassignment statement when
|
||
|
converted to the AST. For this reason, they would not be detected by the previously defined rule and thus
|
||
|
the programmer could use them.
|
||
|
However, issues emerge when having a closer look at the loop constructs.
|
||
|
|
||
|
The \mintinline{go}|range| loop needs to keep track of the element, and it does so (internally)
|
||
|
by using a pointer. This results in a (internal) reassignment.
|
||
|
|
||
|
The regular C-style for loop and the `while' loop do not work at all without reassignments.
|
||
|
Although in the while-loop the reassignment happens at another place, most probably within the loop,
|
||
|
it is easier for the user if a report on the loop construct is generated instead.
|
||
|
|
||
|
The infinite loop does not contain any reassignments and is thereby be legal by the
|
||
|
definition of the reassignment rule. Because of this, it should not be reported.
|
||
|
|
||
|
All other for-loops need to be reported as they contain internal reassignments.
|