Keys, Values and Rules: Three Important Shake Concepts

The title was a click-bait! This article will actually try to explain five instead of three important notions in Shake.

These are:

Rules
Keys
Values
The Build Database
Actions

This short blog post was inspired by the hurdles with my Shake based build, after the new Shake version was released, which had breaking API changes.

Jump to the next section if you are not interested in the why and how of this blog post.

Shake is rule based build system much like GNU make. Like make it is robust, unlike make, it is pretty fast and supports dynamic build dependencies.

But you knew all that already, if you are the target audience of this post, since this post is about me explaining to myself by explaining to you, how that build tool, I used for years, actually works.

Although I used it for years, I never read the paper or wrapped my head around it more than absolutely necessary to get the job done.

When Shake was updated to version 0.16.x, the internal API for custom rules was removed. Until then I was using this API with code mindlessly hacked together until it worked, without actually knowing what I was doing.

But this API change forced me into reading the paper and understanding the wording and concepts. This blog post contains a glossary of some of the terms used in Shake, that I finally understood.

It turned out, that I didn't actually need to read all the documentation. Like I always do, I mindlessly threw bits of code at the project to make it work and it did not work. I concluded that mindlessly throwing bits and pieces of code at the project was not sufficient, and since the Shake upgrade was the only change I made, I assumed I had misunderstood enough about how Shake works, that it would justify me digging deeper into it.

Of course, later I discoverd, that the problem that persuaded me to leave the path of ignorance, had nothing to do with how I used the Shake API. It is important to understand, that the Shake API is pretty neat, in that once it compiles and correctly resolves rules, one can rely on the correctness of the internal mechanisms of Shake.

Important Shake Concepts

Before we start, let me clarify, that by build program I mean a Haskell program using Shake, that creates a specific set of external outputs from a specific set of (optionally external) inputs via execution of arbitrary IO actions, such as invoking external programs.

Rule

A build program basically consist of rules.

A rule maps a key to a build action, that creates the actual output artifact, which the key refers to.

There are two kinds of rules, and there will be an upcoming blog post about extending Shake with custom rules, and that article will explain these two types.

The build action that is stored for a key in a rule must return meta-data, that will be kept in a database and passed to the next invokation of that build action.

This bit of meta-data is called a value.

Value

A value is representation of the content of the artifact generated for a key.

The value is used by the build action to compare old and new build output in order to determine if the build output is different or not.
More precisely, it must only determine if the output is different enough to justify the rebuld of the actions depending on it.

If, for example, the value represents an executable file generated by a compiler, it is possible to directly use the file contents as the value, but it is often faster and requires less disk space to use some placeholder value like an SHA-1 hash, or maybe even a file system modification timestamp.

Shake would apply the build action for that file to "Just" the previous hash, so the action can compare it to the new hash whenever the output file was rebuilt.

The action will return "ChangedRecomputeSame", if the hashes are equal after a rebuild, and Shake would then skip rebuilding the artifacts that depend upon that file, or it can return "ChangedRecomputeDiff" when the hashes differ, and Shake will then also rebuld the dependent artifacts.

A value value should be represented by a custom Haskell data type.
For example:

data OutputFileHash = OutputFileHash Integer

Key

A key represents a specific artifact to be generated by a build program.
Key values are used to specify build targets and dependencies.

A key should also be represented by a custom Haskell data type.
For example:

data OutputFile = OutputFile FilePath

Build output meta-data database

Shake uses a persistent database, stored in a file, to pass build output meta-data from one build to the next.

This database basically contains a map of keys to results.

After a key was (re)built, the database entry for that key will be updated with the new result.

A build result contains:

the value value
the timestamp of last rebuild
the timestamp of last time the value changed

Results contain enough information to determine, if dependent artifacts need to be rebuilt or not.

Action

A shake Action can do one of two things:

Actually do something like invoking ghc or gcc, i.e. perform IO via liftIO
Depend on other artifacts via their keys, which is done by apply or apply1 in Shake.

try { } catch blog

Search This Blog