You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

271 lines
10 KiB

As discussed in Chapter~\ref{sec:funcheck-theory}, to assist writing purely
functional code, a linter needs to be implemented that detects reassignments within
a Go program.
To get a grasp on the issues this linter should report, the first step
is to capture some examples, cases that should be matched against.
\subsection{Examples}
The simplest cases are standalone reassignments and assignment operators:
\begin{gocode}
x := 5
x = 6 // forbidden
// or
var y = 5
y = 6 // forbidden
y += 6 // forbidden
y <<= 2 // forbidden
y++ // forbidden
\end{gocode}
Where the statements with a \mintinline{go}|// forbidden
|comment should be reported.
Adding block scoping to this, shadowing the old variable needs to be allowed:
\begin{gocode}
x := 5
{
x = 6 // forbidden, changing the old value
x := 6 // allowed, as this shadows the old variable
}
\end{gocode}
What should be illegal is to declare the variable first and then assign a
value to it:
\begin{gocode}
var x int
x = 6 // forbidden
\end{gocode}
The exception here are functions, as they need to be declared first in order
to recursively call them:
\begin{gocode}
var f func()
f = func() {
f()
}
\end{gocode}
Furthermore, the linter also needs to be able to handle multiple variables
at once:
\begin{gocode}
var f func()
x, f, y := 1, func() { f() }, 2
\end{gocode}
Loops should be reported too, as they are using reassignments internally:
\begin{gocode}
for i := 0; i < 5; i++ { // forbidden
for i != 3 { // forbidden
for { // allowed
// ...
}
}
}
\end{gocode}
All the aforementioned examples and more can be found in the test cases for funcheck\autocite{funcheck-examples}.
\subsection{Building a linter}
The Go ecosystem already provides an official library for building code analysis tools,
the `analysis' package from the Go Tools repository\autocite{go-analysis}. With this package,
implementing a static code analyser is reduced to writing the actual AST node analysis.
To define an analysis, a variable of type \mintinline{go}|*analysis.Analyzer| has to be declared:
\begin{gocode}
var Analyzer = &analysis.Analyzer{
Name: "assigncheck",
Doc: "reports reassignments",
Run: func(*analysis.Pass) (interface{}, error)
}
\end{gocode}
The necessary steps are now adding the `Run' function and registering the analyser
in the \mintinline{go}|main()| function.
The `Run' function takes an \mintinline{go}|*analysis.Pass| type. The Pass provides
information about the package that is being analysed and some helper-functions to report
diagnostics.
With `analysis.Pass.Files` and the help of the `go/ast` package, traversing the syntax
tree of every file in a package becomes extremely convenient:
\begin{gocode}
for _, file := range pass.Files {
ast.Inspect(file, func(n ast.Node) bool {
// node analysis here
})
}
\end{gocode}
To implement funcheck as described, five different AST node types need to be
taken care of. The simpler ones are
\mintinline{go}|*ast.IncDecStmt|, \mintinline{go}|*ast.ForStmt| and \mintinline{go}|*ast.RangeStmt|.
An `IncDecStmt' node is a \mintinline{go}|x++| or \mintinline{go}|x--|
expression and should always be reported.
`ForStmt' and `RangeStmt' are similar; a `RangeStmt' is a `for' loop with the
\mintinline{go}|range| keyword instead of an init-, condition- and post-statement.
Both of these loop-types need to be reported explicitly as they do not show up
as reassignments in the AST.
Thus, the basic building block for the AST traversal is the following \mintinline{go}|switch|
statement:
\begin{code}
\gofilerange{../work/funcheck/assigncheck/assigncheck.go}{start-basictypes}{end-basictypes}%
\caption{Handling the basic AST types in funcheck\autocite{funcheck-ast-types}}
\end{code}
The remaining two node types are \mintinline{go}|*ast.DeclStmt| and \mintinline{go}|*ast.AssignStmt|.
They are not as simple to handle, which is why they are covered in their own chapters.
\subsection{Detecting reassignments}
To recapitulate, the goal of this step is to detect all assignments except blank identifiers
(discarded values cannot be mutated) and function literals, if the function is declared in the
last statement\footnote{This rule is to simplify the logic of the checker and make it easier
for developers to read the code. It means that no code may be between \mintinline{go}|var f func|
and \mintinline{go}|f = func() { ... }|.}.
To detect such reassignments, funcheck iterates over all identifiers on the left-hand side
of an assignment statement.
On the left-hand side of an assignment is a list of expressions. These expressions can be
identifiers, index expressions (\mintinline{go}|*ast.IndexExpr|, for map and slice access),
a `star expression' (\mintinline{go}|*ast.StarExpr|\footnote{star expressions
are expressions that are prefixed by an asterisk, dereferencing a pointer. For example
\mintinline{go}|*x = 5|, if \mintinline{go}|x| is of type \mintinline{go}|*int|.}) or others.
If the expression is not an identifier, the assignment must be a reassignment, as all non-identifier
expressions contain an already declared identifier. For example, the slice index expression
\mintinline{go}|s[5]| is of type \mintinline{go}|*ast.IndexExpr|:
\begin{gocode}
// An IndexExpr node represents an expression followed by an index.
IndexExpr struct {
X Expr // expression
Lbrack token.Pos // position of "["
Index Expr // index expression
Rbrack token.Pos // position of "]"
}
\end{gocode}
Where \mintinline{go}|IndexExpr.X| is our identifier `s' (of type \mintinline{go}|*ast.Ident|)
and a \mintinline{go}|IndexExpr.Index| is \mintinline{go}|5| (of type \mintinline{go}|*ast.BasicLit|).
As these nested identifiers already need to be declared beforehand (else they could not be used
in the expression), all expressions on the left-hand side of an assignment that are not identifiers
are reassignments.
Identifiers are the only expressions that can occur in declarations and reassignments. A naive
approach would be to check for the colon in a short variable declaration (\mintinline{go}|:=|).
However, as touched upon in Chapter~\ref{sec:multi-assign}, even short variable declarations may
contain redeclarations, if at least one variable is new.
Thus, another approach is needed.
Every identifier (an AST node with type \mintinline{go}|*ast.Ident|) contains an object\footnote{`An
object describes a named language entity such as a package, constant, type, variable,
function (incl. methods), or a label'\autocite{go-ast-object}.} that links to the declaration.
This declaration, of whatever type it may be, always has a position (and a corresponding function
to retrieve that position) in the source file.
A reassignment is detected if an identifier's declaration position does not match the assignment's
position (indicating that the variable is being assigned at a different place to where it is
declared).
This is illustrated in the code block~\ref{code:assign-pos}. What can be clearly seen is that in
the assignment \mintinline{go}|y = 3|, \mintinline{go}|y|'s declaration refers to the position
of the first assignment \mintinline{go}|x, y := 1, 2|, the position where \mintinline{go}|y| has
been declared.
\begin{listing}
\begin{gocode}
Assignment "x, y := 1, 2": 2958101
Ident "x": 2958101
Decl "x, y := 1, 2": 2958101
Ident "y": 2958104
Decl "x, y := 1, 2": 2958101
Assignment "y = 3": 2958115
Ident "y": 2958115
Decl "x, y := 1, 2": 2958101
\end{gocode}
\caption{Illustration of an assignment node and corresponding positions\autocite{ast-positions}\label{code:assign-pos}}
\end{listing}
As this technique works on an identifier level, multi-variable declarations or assignments
can be verified without any additional effort.
If a variable in a short variable declaration is being reassigned, the variable's `Declaration'
field will point to the original position of its declaration, which can be easily detected
(as shown in code block~\ref{code:assign-pos}).
\subsection{Handling function declarations}
In contrast to all other variable types, function variables may be `reassigned' once.
As discussed in Chapter~\ref{sec:func-reassign}, this is to allow recursive function
literals. Detecting and not reporting these assignments is a two-step process, as two
consecutive AST nodes need to be inspected.
The first step is to detect function declarations; statements of the form
\mintinline{go}|var f func() |. Should such a statement be encountered,
its position is saved for the following AST node.
In the consecutive AST node it is ensured that, if the node is an assignment and
the assignee identifier is of type function literal, the position matches the
previously saved one.
The position of the declaration and AST node structure can be seen in~\ref{code:func-reassign}
\begin{listing}
\begin{gocode}
Declaration "var f func() int": 2958142
Ident "f func() int": 2958146
Assignment "f = func() int { return y }": 2958160
Ident "f": 2958160
Decl "f func() int": 2958146
\end{gocode}
\caption{Illustration of a function literal assignment\autocite{ast-positions}\label{code:func-reassign}}
\end{listing}
With this technique it is possible to exempt functions from the reassignment rule.
\subsection{Testing Funcheck}
The analysis-package is distributed with a sub-package `analysistest'. This package makes
it extremely simple to test a code analysis tool.
By providing test data and the expected messages from funcheck in a structured way, all
that is needed to test funcheck is:
\begin{code}
\begin{gocode}
package assigncheck
import (
"testing"
"golang.org/x/tools/go/analysis/analysistest"
)
func TestRun(t *testing.T) {
analysistest.Run(t, analysistest.TestData(), Analyzer)
}
\end{gocode}
\caption{Testing a code analyser with the `analysistest' package}
\end{code}
The library expects the test data in the current directory in a folder named `testdata' and
then spawns and executes the analyser on the files in that folder. Comments in those files
are used to describe the expected message:
\begin{gocode}
x := 5
fmt.Println(x)
x = 6 // want `^reassignment of x$`
fmt.Println(x)
\end{gocode}
This will ensure that on the line \mintinline{go}|x = 6| an error message is reported that says
`reassignment of x'.