YAML in Go: Parsing nested yaml using the ghodss/yaml library

The intent of this article is to shed light on something that, as a newcomer to Golang and a programmer falling somewhere between beginner and intermediate in general, I found very confusing. I refer of course to the titular subject, parsing Yaml for use in a Go application.

My need arose from needing to extract data from a series of MongoDB conf files. Such a file might look like this:

# Ansible: mongodb Configuration File

processManagement:  
   fork: false

net:  
   port: 27017

systemLog:  
   destination: file
   path: "/data/logs/mongodb.log"
   logAppend: true

storage:  
   dbPath: /data/db
   engine: mmapv1
   mmapv1:
     preallocDataFiles: true
     nsSize: 16
   journal:
      enabled: true

replication:  
   replSetName: prodreplica00
   oplogSizeMB: 200

As you can see, this is a nested Yaml file. In order to parse this file and get the data I need, I'm going to have to go multiple levels deep. Coming from Python, I assumed this would be simple - read the data into a Dictionary and then just do, for instance:

port = dict['net']['port']

But as far as I was able to determine, Go doesn't work quite the same way. After a little research, I found a lovely library for unmarshaling Yaml and JSON in Go - in fact, it actually converts Yaml to JSON first and then unmarshals it. I thought this would get me to python dict-like bliss, but as it turns out, the reality is that it's somewhat more complex than that.

Those who followed the link will notice that the ghodss/yaml library's repo does contain an example - but it only shows extraction of data a single level deep in a Yaml struct. It took me a lot of googling and searching for places on Github where that library was imported, but eventually I figured out how it worked! Taking the mongodb.conf example from above, I'd like to extract the dbPath, port, and replSetName for use in a backup tool that I'm writing.

First, replconf.go:

package replconfig

import (  
    "io/ioutil"

    "github.com/crielly/mongosnap/logger"
    "github.com/ghodss/yaml"
)

// Config describes the configuration of a MongoD process
type Config struct {  
    Net struct {
        Port int `json:"port"`
        BindIP string `json:"bindIp"`
    } `json:"net"`
    Storage struct {
        DbPath string `json:"dbPath"`
    } `json:"storage"`
    Replication struct {
        ReplSetName string `json:"replSetName"`
    } `json:"replication"`
}

// ReplConfig unmarshals the Yaml from a mongodb.conf file
func ReplConfig(configPath string) (replconf Config, err error) {

    y, err := ioutil.ReadFile(configPath)
    if err != nil {
        logger.Error.Println(err)
    }

    err = yaml.Unmarshal(y, &replconf)
    if err != nil {
        logger.Error.Println(err)
    }

    return replconf, err
}

The key is the nested structs. For each level, you need to define a struct which pieces of data you want to extract from the json-ified objects that the library converts your yaml into. Once you see it in action, it's pretty straightforward. To actually use the data is equally straightforward:

package command

import (  
    "flag"
    "fmt"
    "sync"
    "time"

    "github.com/crielly/mongosnap/backconfig"
    "github.com/crielly/mongosnap/logger"
    "github.com/crielly/mongosnap/lvm"
    "github.com/crielly/mongosnap/replconfig"
    "github.com/crielly/mongosnap/s3upload"
    "github.com/mitchellh/cli"
)

// Backup command performs a Mongo backup
type Backup struct {  
    BackConfYamlPath string
    UI               cli.Ui
}

// Run the backup
func (b *Backup) Run(args []string) int {

    cmdFlags := flag.NewFlagSet("backup", flag.ContinueOnError)
    cmdFlags.Usage = func() {
        b.UI.Output(b.Help())
    }

    cmdFlags.StringVar(&b.BackConfYamlPath, "confpath", "backconfig/mongosnap.yml", "Path to YAML MongoSnap config")

    if err := cmdFlags.Parse(args); err != nil {
        logger.Error.Println(err)
    }

    replconf, err := replconfig.ReplConfig(confpath)

    if err != nil {
        logger.Error.Println(err)
    }

    dbpath := replconf.Storage.DbPath
    port := replconf.Net.Port
    replsetname := replconf.Replication.ReplSetName

And there you have it! Multi-level Yaml parsing. As you can see, you don't need to define every value that could be present in a Yaml doc - it'll just ignore values you don't explicitly define in a struct.