Shitty SSG: Code Walkthrough

One would expect a walk through the code of a self proclaimed “shitty” software project not to be a pleasant read. I promise that it is not that bad. Earlier I described why I built this static site generator and the Asciidoc subset that it uses as a markup language. Here I will describe the code that transforms these documents into a website. Later I will go more into detail on how the flexibility of this generator is used to create and publish to multiple sites from multiple sources in one go.

Originally I made the SSG to create one site. But it turned out to be not that hard to generalize it and now it generates:

In the future I might add an export for a Gopher or a Gemini site and if the collection of notes grows I’ll probably improve the search somewhat. Now it only finds notes on tags or whole words. It displays them just in plain text. I will also probably dive more into shell escape codes and such to see if the apperance can be improved.

The code itself can be found here.

Directories

So how does it work? On a very high level, the directory structure looks like this:

.
├── cmd
│   ├── notes
│   └── ssg
└── pkg
    └── adoc

I arrived there by following the conventions for folders I always follow when starting a new Go program. In this case there are two binaries. One is the notes program, the other the site generator. That one generator can generate both sites, depending on the configuration its fed.

Everything is built on the adoc package, that parses Asciidoc files in the most clumsy way possible. You give it a string, it gives you back a pointer to an ADoc struct that holds all information found in the string.

There is no internal directory. That might have been a mistake. A possible use for this directory is to store packages that contain domain logic that is specific to the programs in the repository. I thought there would not really be any domain specific thing in this project. We read a file, let adoc parse the contents and once we know that we can go straight to the formatting of HTML, I thought. But it turns out there are some rules that must be applied and that is a translation on the kind of document we are processing.

There all kinds of documents. Notes, stories, tutorials, etc. These are indicated by the :kind: metadata property in the Asciidoc document. On the site, there are also different kinds of posts, but these don’t map one on one. Essays and tutorials get lumped together in the more generic article kind on my personal site, fo instance. Once I had the note viewer, I also started to store private notes into the system. But these private notes should only show up in the viewer, not on any site.

In the end, I thought that this translation of “source kind” to “output kind” was specific to the thing we’re outputting to, so the translation should be done at the level of the programs, in cmd/notes and cmd/ssg. But it feels weird and somehow repetitive. Anyway. It works now. The public/private thing still irks me though.

Asciidoc parsing

As said, the parsing of the documents is clumsy and the most amateurish part of this code. I was interested in creating a parser by hand already before this project and enjoyed, for instance, reading Handwritten Parser and Lexers in Go by Ben Johnson. Following his lead, I created a lexer and a parser and started writing a bunch of unit tests to implement the rules and then... I figured out two things.

The first was that I am not a very smart person and that the recursion in the parser kept being confusing. My mistake here was probably that I did not do any background reading on the different types of parsers and how they work. Most documents online talk about grammar and generated parsers, which was not what I wanted. I just wanted to build something from the ground up, like in the article of Ben Johnson. But there are all kinds of possible strategies and structures to parse a text. The blog posts I used for research did mention that, but I kind of glossed over it. I thought I would just take “the easiest” version, without checking whether everyone had the same definition of “easiest”. It did not really help that the examples I found where in different programming languages that were all new to me. In short, I did not have a clear picture of how I wanted to do it.

The second thing was reading the Asciidoc specification. Trying to read it. Because I quickly discovered that it was a lot bigger than I remembered and that it contains a lot of rules. At the start I thought I would be able to unit test myself out of it, but together with the first point I lost faith that I was going to accomplish that.

So I stopped programming for while and started to think about that. This was coming way too big for a side project. Then it hit me. While the whole specification is huge, the portion that I need for my generator is small. I only needed the parts that map to the simple HTML I was going to generate. And looking at HTML, I could break it up even further. HTML consists of block level elements and they contain inline elements. The inline elements were already working in the failed version, the block elements are basically the things separated by an empty line in the document. I might not be smart enough to write a complete parser, I do know how to split a text on empty lines.

In practice, this was a little more complicated. Code blocks can have empty lines, for instance. A List is a block element that contains list items, which are also block elements. But it was not hard to add those once the basics were there. It does not make for the nicest code though.

The inline parsing is just a big state machine with a lot of variables and lots of fiddling to get te behavior I wanted. There are bugs, but it’s usable for me.

Generating a site

The generator itself consists of two parts. There is a Post type and a Posts collection type. The latter supports filtering, sorting, etc. by method chaining. The second part is a type for render functions:

type TemplateConfig struct {
  ...
  Render func(targetPath string, tpl *template.Template, posts Posts, staticPages []*StaticPage) error
}

This takes a go text template, the big Posts collection and a target path. The collection is filtered, sorted, etc. in the render function and is then fed to the template. The template get rendered to the target path.

So what happens is:

The StaticPages work similar. These are the content parts that are not a post, but still are specific to the site. For instance, the Other and About pages on erikwinter.nl. The HTML for that is just stored in a separate folder that is read at startup.

And that is enough. I haven’t looked back since.