Generating this site on Zig 0.16
These are some rough notes on updating a small project for Zig 0.16. There are bound to be misunderstandings and general confusion, since I'm not taking a disciplined approach to learning the language or keeping up with its roadmap.
The program that generates this site is written in Zig11 I rewrote the program from Rust because I prefer Zig's minimalism and it seemed like a fun challenge., a programming language that's still undergoing breaking changes with each new version. Before I was finished with the rewrite, version 0.15 caused Writergate. This changed how the language provided file descriptor-based reading and writing. The new interfaces require a buffer, explicit flushing to drain that buffer, and care to always pass a pointer to the writer structure for downstream clients, instead of copying it. That last point caused one of the more painful debugging sessions I've had with Zig22 Not helped because lldb was inexplicably non-functional on my machine at the time..
feat: update to Zig 0.15
8 files changed, 213 insertions(+), 211 deletions(-)
In April 2026, Zig updated to 0.16 and revamped its standard library's approach to handling I/O.
Functions now create or observe side effects outside of the the currently-executing CPU's register file and memory33 Globals and stack memory are allowed implicitly, but the heap's side effects use a std.mem.Allocator to manipulate. by accepting a std.Io argument.
This includes file or network I/O, entropy and random numbers, synchronization primitives, and even reading a monotonic clock.
The effect of this new requirement on my site was significant44 The commit prefix I used here compared to 0.15 reflects the slog this was.:
chore: update to Zig 0.16
10 files changed, 542 insertions(+), 518 deletions(-)
Most of the changes just required plumbing std.Io to all the functions that needed them.
Zig tooling
It still took me a few hours to update the site by hand. The most frustrating part of the process is the language server not showing errors and the compiler only showing the next reachable error. I'm not sure why the language server couldn't detect obvious errors, like incorrect standard library interfaces. The compiler's friction is due to an explicit design decision: it only compiles code that is used.

In practice, this means you need a persistent split window to run zig build --watch alongside an editor.
Before language servers, I would use vim's quickfix support for this or maybe even Acme's right-click to go to a line.
But in the last five years, I've gotten used to a language server showing diagnostics.
In Helix, you can list them all with <space>d and move between them with [d and ]d.
But there's no support for feeding the output of a compiler into this UI.
So I stuck with the --watch approach.

It would be lovely if each Zig release came with a source-to-source translation tool of some kind, like Hare often includes.
Multithreading
A reason for all of this pain is to provide a swappable runtime for Futures and async/await in the standard library.
My site generator, despite being almost completely I/O bound, was single-threaded as it waited for Zig's concurrency story to be more fleshed out.
But adding concurrency here was significantly more difficult than the previous iteration of the site written in Rust.
In Rust, I think I just adopted Rayon's par_iter to process notes, slapped on a Mutex where the compiler told me there was shared mutable state, and then the change worked.
Zig is much looser with temporal and data race safety, so I had to take more care.
The closest thing I could find to "generate a bunch of work that should happen concurrently" was a std.Io.Group.
I reworked a few of the algorithms to not rely on state that would need to be shared.
For instance, indexing words needed to add them to a singular hash table during Markdown conversion.
In the multithreaded code, each note gets its own hash table, which are merged together once all notes are processed.
I decided to use a lock on the diagnostics list, since those should be much rarer.
The std.Io.Group code I wrote looks like this:
var group: std.Io.Group = .init;
for (0..self.notes.items.len) |i| {
group.async(io, processAsync, .{
io,
allocator,
self,
&processing,
i,
notes_dir_in,
notes_dir_out,
index,
diagnostics,
});
}
try group.await(io);
Where processAsync is just a wrapper around process that can only fail with error{Canceled} and hides any underlying errors with catch unreachable.
Clearly not production-quality code, but I was too lazy to add an out-parameter for the actual errors.
Despite that, these hacks did give an immediate speedup to the end-to-end latency (this is from the debug build):
notes: 97 ( 400.239 KiB) 37.078 ms ( 73.1%)
/ md: 97 ( 400.239 KiB) 67.998 ms (183.4%)
This is saying that overall notes processing took 37ms, but the total time spent working on Markdown files was 67ms, indicating a bit of parallelism. But my computer has 8 CPUs, not 2.
std.Io.Group is probably the wrong primitive here.
Looking at the trace in Instruments, the main thread can "eagerly" take on work to execute (running the passed-in function) if all of the other CPU's threads are still working on a function.
Because the loop around notes is responsible for enqueuing new work items, this can starve those threads for milliseconds.
I'm not sure why this behavior exists when group.await could just as well start pulling work items without interfering this way.
And using group.concurrent instead of group.async just overcommits dozens of threads which each get their own tiny slice of CPU.

What I need here is an async I/O implementation that takes a batch of operations to execute and does work-stealing.
In Swift or C on macOS, I would reach for Dispatch.concurrentPerform, which is still not a great answer for asynchronous I/O.
I'm not sure this is what std.Io.Batch is meant for, despite the name.
I could simulate this with a pipeline that puts the Markdown conversion in between two std.Io.Queues for Markdown text and then HTML to write.
At these small durations, I'm wary of the overheads eating into the useful work being done.
Or maybe std.Io.Select is the right way to ensure the operations don't interfere with enqueuing work.
The documentation around these isn't very clear to me from just reading the standard library reference.55 If anyone with more experience with Zig's concurrency primitives has a path I could take, I'd love to hear from you.
In any case, I applied this approach to a few other places in the generator that seemed like they would benefit:
3ms: Writing the backlinks on each note66 Backlinks are the notes that link to the current note and show up at the bottom of each note. in parallel made that go from ~5ms to ~2ms.
0ms: Stemming77 Stemming is where full words are converted into a form that preserves their meaning but avoids unnecessary uniqueness. This includes removing the trailing "s" for plural nouns, among (many) other heuristics for English. for the index showed up as an expensive single-threaded task, but moving that into the word writer (during individual note processing) didn't help.
5ms: 7ms of index writing went down to around 2ms when done in parallel.
2ms: Converting pages (like the colophon) and copying static assets in parallel helped by a couple milliseconds.
The site originally generated in ~45ms on my MacBook Air M2 and now finishes in just under 20ms. I should be able to get this to sub-10ms with the right concurrency design, but at this point there aren't any easy wins left.
Despite my initial stumbles, I'm still really excited for the future of concurrency in Zig!



