Context: Post #9 in my sequence of private Lightcone Infrastructure memos edited for public consumption.
First, a disclaimer. Before you automate something, first see whether you can just not do the thing at all. Questioning the Requirements is a step that should always happen before you gleefully systematize a task.
One reason for focusing on automation that bites harder at Lightcone than other places is that we are an organization that is trying very hard to stay small and in-sync with each other. This plays into the general point of increasing returns to effort. If you can automate a task that takes half a full-time-equivalent at a bigger organization, then that is much less valuable than automating a task that is taking up half a Lightcone team-member.
Now, automating things is great. Machines are cheap. Most of our work is the kind of stuff that can be automated with software. However, there are both a number of common traps associated with automating tasks, and a number of virtues that are particularly helpful guides for automation work.
1. It's OK to automate a part of something.
A blogpost that goes viral on hacker news from time to time is this: https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-the-key-to-gradual-automation/
Excerpting the most relevant parts (and converting the python shell script into TS):
Every ops team has some manual procedures that they haven’t gotten around to automating yet. Toil can never be totally eliminated.
Very often, the biggest toil center for a team at a growing company will be its procedure for modifying infrastructure or its procedure for provisioning user accounts. Partial instructions for the latter might look like this:
- Create an SSH key pair for the user.
- Commit the public key to Git and push to master.
- Wait for the build job to finish.
- Find the user’s email address in the employee directory.
- Send the user their private key via 1Password.
- [...]
This perception of futility is the problem we really need to solve in order to escape from these manual slogs. I’ve found an approach that works pretty reliably: do-nothing scripting.
Do-nothing scripting
Almost any slog can be turned into a do-nothing script. A do-nothing script is a script that encodes the instructions of a slog, encapsulating each step in a function. For the example procedure above, we could write the following do-nothing script:
#!/usr/bin/env ts-node import { ask } from "readline"; const username = process.argv[2]; if (!username) { console.error("usage: ts-node script.ts <username>"); process.exit(1); } console.log(`Run:\n ssh-keygen -t rsa -f ~/${username}`); await ask("Press Enter to continue: "); console.log(`Copy ~/new_key.pub into user_keys repo, then run:\n git commit ${username}\n git push`); await ask("Press Enter to continue: "); console.log("Wait for build job at http://example.com/builds/user_keys to finish"); await ask("Press Enter to continue: "); console.log(`Go to http://example.com/directory\nFind the email for user ${username}`); const email = await ask("Paste the email address and press enter: "); console.log(`Go to 1Password\nPaste ~/new_key contents into a new document\nShare with ${email}`); await ask("Press Enter to continue: "); console.log("Done."); rl.close();This script doesn’t actually do any of the steps of the procedure. That’s why it’s called a do-nothing script. It feeds the user a step at a time and waits for them to complete each step manually.
At first glance, it might not be obvious that this script provides value. Maybe it looks like all we’ve done is make the instructions harder to read. But the value of a do-nothing script is immense:
- It’s now much less likely that you’ll lose your place and skip a step. This makes it easier to maintain focus and power through the slog.
- Each step of the procedure is now encapsulated in a function, which makes it possible to replace the text in any given step with code that performs the action automatically.
- Over time, you’ll develop a library of useful steps, which will make future automation tasks more efficient.
A do-nothing script doesn’t save your team any manual effort. It lowers the activation energy for automating tasks, which allows the team to eliminate toil over time.
The blogpost is written in the context of a company with lots of standard operating procedures. We do not have that, for good reason (maybe a future principle should be "against process"). Nevertheless, the basic approach is valuable.
Automating a recurring task often fails as realize that there is something about it that seems hard for a program to do. But most of the time there are some parts of a task that can be automated! Putting the task into a context where now gradually automating it becomes possible (by, for example, having it be embedded in an Airtable automation, or a script like the above), lets you start chipping away at it.
Often however, the obstacles to automating something will go deeper, which will bring us to the second point:
2. Automating something will often require finding totally new solution to the underlying problem (and that solution will often be worse, and this will be worth it).
This can definitely be taken too far, but frequently a process or approach to a problem needs to be pretty deeply refactored in order to be amenable to automation. Some examples that come to mind:
Your time is really very valuable. It is almost always OK for a task to be done at a lower standard of quality, if doing so truly frees up your time. This is especially true if the resulting system is truly open loop free for the organization
3. Automations are particularly prone to creating zombie-like substructures. Therefore make them visible.
A big issue with automations (and many forms of process in-general) is that when a task starts being executed as part of such an automation, the task often loses a lot of inherent flexibility. And even if it doesn't, people will tend to forget the original goal of an automated task, making it much less likely for them to notice if the task has become unnecessary, or less important (or more important, such that bad tradeoffs are being made against quality).
This makes it particularly important for automations to recurringly produce messages and notifications about what they are doing. The default place for this is Slack. Daily run reports, messages every time something gets processed, and some weekly or monthly aggregate report of what the automation is doing, are all important so that people don't forget what's going on behind the scenes.
4. UI design is automation
Much of good UI design is the act of automating away unnecessary parts of a task. A good UI for a task eliminates all the unnecessary work from a task, and leaves only the crucial decisions that the person using the UI is needed for. A confusing UI usually indicates that you are asking the user to make a decision that they do not need to make, that could have been automated for them instead.
Common indicators that you are failing to automate all automate-able parts via a custom UI:
In many cases a better handle for the task of "automating X" is "making a UI for X".
5. Airtable and Slack are your friends
At Lightcone the default place where (non-LessWrong-related) data should end up is Airtable, and the logic for administering the automation should first live in an Airtable automation, and if it gets too complicated, in our internal infrastructure Github repository.
Inasmuch as possible, try to make it so that interacting with the problem can happen fully in Slack. In lieu of that, build the interfaces in Airtable and link to them from Slack. If that still isn't powerful enough, make a custom webapp that gets linked to in Slack messages frequently. Try to avoid processes that do not get triggered by Slack messages, or do not cause updates in Slack when completed.
Slack has surprisingly powerful tools that IMO we are currently underusing. Slack messages can open up modals with complicated custom UI driven by complicated custom data. Messages can have many different buttons that trigger external automations. You might be surprised what you can do with Slack.