September 7, 2019

Go sort

I previously posted code that adjusts times between timezones. Writing that code, I did not want to rely on any external libraries for text search, so I instead used the sort package to create a minimal search function. There are many good tutorials on sort; this post merely shows how I used it.

My main data structure was a struct describing a particular timezone:

// TimeZone represents the region and subregion of a timezone,
// and provides a reference to the Unix database
type TimeZone struct {
	UnixZoneInfo string
	Region       string
	Subregion    string

	location *time.Location
}

In the code, I create instances of this struct by reading the Unix timeinfo database from disk. Really, I’m just reading the names of entries as a later function call reads the actual data to make time adjustments.

Since there are about 570 or so entries, I want the selection of a particular subset of entries to be efficient. So I reached for a binary sort to select matching subregions in O(log(n)) time. I could have built a map for O(1) constant lookup – but that would have been boring in addition to being smart.

Three functions need to be implemented for the Sort interface: one for determining the length of the collection, one for swapping entries in the collection, and one for establishing order in the collection. In order to sort TimeZones by subregions, I defined a type for the collection and to implement the interface:

// BySubregion is a container for TimeZone pointers, implementing the Sort interface
type BySubregion []*TimeZone

func (tz BySubregion) Len() int           { return len(tz) }
func (tz BySubregion) Swap(a, b int)      { tz[a], tz[b] = tz[b], tz[a] }
func (tz BySubregion) Less(a, b int) bool { return tz[a].Subregion > tz[b].Subregion }

Once the BySubregion has been sorted – an O(n) operation, I think – then we can easily conduct a binary search along its members. Here, I implemented this as a single function, using a closure to express the bestMatch for an entry:

// Find returns a pointer to a TimeZone matching the given subregion string
func (tz BySubregion) Find(subregion string) *TimeZone {
	bestMatch := func(i int) bool { return tz[i].Subregion <= subregion }

	if !sort.IsSorted(tz) {
		sort.Sort(tz)
	}

	idx := sort.Search(tz.Len(), bestMatch)
	return tz[idx]
}

At worst, this Find function will run O(n*log(n)) as it sorts then searches; at best, it runs O(log(n)) on an already sorted collection.

That’s about it. Do feel free to look at more detailed explanations of the sort package. I hope this post gives a nice easy, practical example for how to use it.

Content by © Jared Davis 2019
Theme by © Emir Ribic 2017

Powered by Hugo & Kiss.