Sure, I can do that!

Oftentimes one needs to use a script to perform operations that are too tedious to do by hand for a regular developer, operator, or administrator. Whether it is some kind of search or file manipulation across entire filesystems or maybe within a certain subset of folders, one might encounter a situation where bash or batch scripting just becomes too complex or tedious to follow.

I’ve loved using Go for scripting as well as programming web servers for about 3 years now. So naturally, this is my goto for something that might’ve just as easily been coded in ruby or python. While I feel that these languages are great tools, Go’s verbosity just appears so clearly to me. It’s fun for me to read and write Go code because 9/10 times I can easily trace what a program is doing internally.

In this article, I’m going to outline how one might build a simple script to solve a real world problem. This script has not been optimized because it serves the use case in an acceptable time for the enduser and also the enduser doesn’t care about very large edge cases.

The object of this piece of software is to aggregate the lines of multiple text files into one file and alphabetize the results. This replaces a task the enduser was doing manually, at the end of every workday, probably for multiple folders (What a nightmare!). You can find the final code & binary built for windows (.exe) here.

Top of the File

Let’s take a look at the initial lines. Since this is a single file script, the first few lines should look familiar to anyone who has hacked around with Go for a bit a little beyond the “Hello World”. Package declaration, imports, func main. We stay within the standard library packages so there’s nothing too wild happening, any machine with a relatively recent version of Go can compile this program. For more complex projects that require 3rd party dependencies, I would use and maintain a module file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
package main

import (
	"bufio"
	"flag"
	"io/ioutil"
	"log"
	"os"
	"sort"
	"strings"
)

func main() {
	//  take all the text files in the current directory and flatten them into one text file,
	//  sorted alphabetically and remove duplicates
	path := flag.String("path", ".", "path to the folder containing text files to flatten and sort")
    flag.Parse()

The highlighted lines are parsing and holding the flags that should be passed when a user runs this program from the command line.

The flag package helps us write simple command line interfaces. If we were doing something that required even one more flag, I may consider using cobra, which makes developing command line interfaces a breeze. Here, on line 16 we name the flag passed, provide a default value, and a message to display to explain to the CLI user what the purpose of the flag is. You can read more about the flag package here. We use the wildcard . here to indicate that if the binary is passed no flags, it should look in the directory it is in.

19
20
21
22
23
24
25
26
27
28
29
30
files, err := ioutil.ReadDir(*path)
if err != nil {
    log.Fatal(err)
}

var textFileNames []string

for _, file := range files {
    if strings.Contains(file.Name(), ".txt") {
        textFileNames = append(textFileNames, file.Name())
    }
}

This section of code reads the directory supplied by the path flag, makes an empty slice to hold the file names, iterates over that slice, checks for .txt files and adds the file name to the file name slice if it is. We check for the .txt extension to make sure that we only operate on the files that hold text and that we can manipulate since we can’t be sure that only text files will be in the intended directory.

The Meat

dupeCheck is the map of string keys to boolean values. finalAllowList is the slice of type string that is meant to hold the values provided from the output of our operations. We range over the file names and open the file at the provided path.

32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
dupeCheck := make(map[string]bool)
finalAllowList := []string{}

for _, fileName := range textFileNames {
    // Open file and create scanner on top of it
    file, err := os.Open(fileName)
    if err != nil {
        log.Fatal(err)
    }
    defer file.Close()

    scanner := bufio.NewScanner(file)
    for scanner.Scan() {
        if _, ok := dupeCheck[scanner.Text()]; !ok {
            finalAllowList = append(finalAllowList, scanner.Text())
            sort.Strings(finalAllowList)
        }

        dupeCheck[scanner.Text()] = true
    }
}

We open the file to use the contents, since the result of the open file API is a file that conforms to the io.Reader interface. We can pass it as a parameter to bufio.NewScanner in order to produce a scanner, which can make the reading of files very easy. A scanner is a concept in many programming languages, the idea being that it is a way for developers to parse through a buffered stream of bytes. You can read more about the bufio package here.

We then scan using the scanner in a for loop that breaks when the scanner has reached the end of the stream. The default delimiter is \n, which means that when the scanner detects a new line, it will pass in a new scanner.Text() into the for loop. Lines 45-48 are a check for an existing key for the line input. If the map has no value for the given key, we append the text to the final allow list and sort that list alphabetically. We then initialize that string as a key with a true value, and end the loop. This ensures that we only have one copy of each line of text even if multiple files contain the same line of text.

We sort each time we add to the list, this makes it so each operation occurs with an already ordered list. This should give us a little bit faster performance, but I have not tested this piece of code at a greater scale than about ~10 files with ~500 lines of text each.

Last Steps

We open a file again, this time we provide a few more parameters than last time. We give a default name of the file that should be created, and pass in options that control the access we want to give this operation. I chose to allow it to create a new file and write permissions only, along with 0644 code which means that this operation will run with the sufficient permissions. You can read more about the os package here.

54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
file, err := os.OpenFile("output.txt", os.O_CREATE|os.O_WRONLY, 0644)

if err != nil {
    log.Fatalf("failed creating file: %s", err)
}

datawriter := bufio.NewWriter(file)

for _, data := range finalAllowList {
    //  since this app is for windows usage, we need to add \r
    _, _ = datawriter.WriteString(data + "\r\n")
}

datawriter.Flush()
file.Close()

In the highlighted code we initialize a new buffered writer from the bufio package. We point the writer at the file we created in the step before that. We range over the slice of the final allow list strings. The WriteString API returns an int equal to the number of bytes written to the file, as well as any potential errors that may bubble up as to why the final bytes written doesn’t match the total length of the data. We use the special carriage return character \r because Windows won’t recognize a new line otherwise. We close this piece of code by flushing the writer and closing the file. Note that closing the file is preferred over deferring the file close.

Building a .exe binary

This code is good to go! Except for the building of the binary part. Go has the capability to build binaries that can run on any architecture. We just have to pass it build flags that match the specification of the machine we’re intending to run our binary on.

GOOS=windows GOARCH=386 go build -o text-file-flattener.exe text-file-flattener.go

If you run this command in your terminal, it should output the exe file that is then runnable on a Windows machine!

Final Thoughts

Scripting is fun with Go, especially when you don’t have many constraints to worry about! 😉 Code should make our lives easier, not harder. When we think about leveraging code to do the menial work, we free ourselves to consider solutions to more difficult problems (which then results in more code- YUCK! 😜). So the next time you find yourself wondering how life could change while ordering and sorting text lines manually, try writing a simple script!


Draco