April 3, 2020

value semantics and pointer semantics

I learned in Bill Kennedy’s Ultimate Go training that the pointer semantics of go code helps guide software program design. This differs from the vague notion of pointer syntax. As a long-time student of syntax vs semantics in logic and mathematics, Kennedy’s teaching felt refreshing as I finally began to understand a fundamental philosophy underpinning go.

See this blog series and these study materials for a more complete explanation. This post serves as my personal recall of the workshop.

Vague Notions

Ruby developers should be familiar with the instance variable

@luggage = { password: 12345 }

def check_password(data = {})
  data[:password] == 12345
end

check_password @luggage

The instance variable assigns data on the heap and passes a reference within the scope of declaration. As an interpreted programming language, Ruby abstracts away the notion of pointers through its object oriented semantics, and we only need to recognize the scoping rules to use references appropriately. The virtual machine takes care of (most of) the rest; certain guidelines prevent programmers from making too much a mess of things. We can happily follow such guidelines and ignore the machine level details.

Incorrect Semantics

But what about go? The temptation might be to do this, which appears syntactically similar to the Ruby snippet

type luggage struct { password int }

func checkPassword(my *luggage) bool {
  return my.password == 12345
}

func main() {
  my := &luggage{ 12345 }

  checkPassword(my)
}

Both code snippets suffer from the same weakness. But the vague notions around permanence of data don’t tell us precisely what’s going on. Yes, we can say that the ruby program doesn’t really have a need for an instance variable, since the data in the program only sees temporary usage. But it’s the scoping rules that tell us that in ruby, not so much any knowledge about the machine. We could mimic the scoping behavior with the go program by modifying the main function to be the one-liner

checkPassword(&luggage{ 12345 })

which looks like, given our vague notions about value and reference, that we’re just building temporary data within the scope of the function and then using a pointer syntax to satisfy the function type. That’s fundamentally incorrect.

Precise Semantics

A pointer reference allocates data on the heap, consuming memory. Done properly, reference passing allows us to share the allocated data. But how do we do this properly in go? In ruby, the scoping rules guide the developer and the ruby virtual machine handles the rest. In go, we instead want our programs, in Bill’s phrasing, to be “sympathetic with the machine on which it’s operating”.

Built-in types in go represent values: string, numeric, bool. On the machine, these have a precise integer representation, and therefore can be handled efficiently on the stack. There’s very little need to allocate these data on the heap. Thus, the semanics should follow pass-by-value approach in function parameters and struct fields. There are exceptions, but in most cases the pass-by-value semantics allows us to remain sympathetic to the machine operations and precise in usage.

// always pass builtin values directly
func isEven(n int) bool {
  return n % 2 == 0
}

Reference types in go represent, well, references to values: channel, slice, map, func. Here, the term “reference” is overloaded; we do not necessarily mean pointers. A channel refers to a signaling operation, where the signal is of a given type. A slice is an abstraction on top of an array, using a pointer to assign memory for its underlying values. A map, similarly abstracts over its underlying data. And a func creates an entry in the runtime’s itable – a concept I don’t understand and need to learn.

// always pass reference types directly
func every(f func (int) bool, xs ...int) bool {
  for _, n := range xs {
    if !f(n) {
      return false
    }
  }
  return true
}

User-defined types, created using struct syntax, allow the programmer to compose new types, building upon the previous semantics.

type luggage struct {
  password int // value semantics used here
  contents string
}

The first exception to preferring pass-by-value might come up in struct fields, where we want to convey the explicit absence of value

type luggage struct {
  password int
  contents *string // the field could be absent as opposed to zero
}

But that should be a code smell; if you can avoid it, do so.

Lastly interface does not represent value. interface represents behavior and is enforced by the compiler. In practice, this means the compiler checks that the method set of a type follows a precise semantics - and we’ve come full circle in our philosophy of the language.

type security interface {
  Unlock(int) (string, error)
}

func crack(security) (password int) {
  for {
    if _, err := security.Unlock(password); err != nil {
      return
    }
    password++
  }
}

var ErrIncorrectPassword = errors.New("incorrect password")

func (my luggage) Unlock(password int) (string, error) {
  if my.password != password {
    return "", ErrIncorrectPassword
  }

  return my.contents, nil
}

As the behavior changes, we start seeing the semantics come into play. For instance, we also have a type hotelSafe which implements the security interface. In order to enclose items in the hotel safe, we need to set a password - something we didn’t need to do with the luggage.

type hotelSafe struct {
  password int
  contents string
}

func (my hotelSafe) Unlock(password int) (string, error) {
  if my.password != password {
    return "", ErrIncorrectPassword
  }

  return my.contents, nil
}

func (my *hotelSafe) Lock(password int) {
  my.password = password
}

And now, to remain consistent in our semantics, we should refactor our implementation of the security interface for hotelSafe to use a pointer method

func (my *hotelSafe) Unlock(password int) (string, error)

Why? Consider this program flow

var my hotelSafe

stuff = "bananas"

my.contents = stuff
my.Lock(12345) // same as my luggage!

if stuff, err := my.Unlock(12345); if err != nil {
  // I forgot my password, but I need my stuff!
  crack(&my)
}

The code runs perfectly fine. But we’ve forced ourselves to use a mixture of value and pointer semantics, making bugs likely if we were to operate on the hotelSafe in a complicated flow. Consistency reduces the chance that we modify shared data that’s in use.

Still Learning

If you’re like roughly 1 out of 5 software developers, you never took a computer science course in college. You may have heard of terms like pass by value and pass by reference, have some vague notion of a computer program’s stack and its heap, and - like me - gotten along pretty well without thinking much about the implications. Modern interpreted programming languages are wonderful in abstracting away the details and allowing us to be productive in solving practical business problems.

I always found explanations stack and heap hideously boring, dense, hopelessly confusing. Pass-by-value is when a function operates with data on the stack, pass-by-reference is when a function operates with data on the heap. The stack is where the program runs, instruction by instruction, like a tape on a turing machine or something. Memory on the stack lives only during the program execution. The heap is where the program stores data for random access; the program uses addresses, handled on the stack, to read or write memory during execution. A pointer is an address - it points to the data in memory. And when you share memory between functions without passing data directly via parameters, you’re really passing pointers.

Like I said, vague notions, and never really was my cup of tea despite being able to work effectively with code. By paying attention to the data semantics and looking for “sympathy” with the machine pattern architecture, I’m able to finally wrap my head around stack, heap, and all that CS terminology I never really learned.

Content by © Jared Davis 2019-2020

Powered by Hugo & Kiss.