More baking

Seems my last additions to Bakery pulled a bit of a plug, because I have kept adding stuff.

When I talked to Joacim I was reminded of how I fixed a PHP script to import his thousands of Markdown files into Wordpress and became curious how Bakery would do with a dataset that size (7000 files). Joacim happily sent me the whole lot, I ran Bakery and watched in amusement as it was tripped up by various things it had never encountered in my own carefully restricted file format.

The thing was that all his files (coming from some kind of standardized export) start with a metadata section whereas my own files simply have the headline as the first line, meaning they all came out with the headline "---".

The cool thing was that Bakery actually ran and produced output even the very first time, and it only took tens of seconds to do so. Encouraged and eager to get useful result I started working on metadata parsing and ended up going to bed a lot later than I had planned.

In the process, I learned to add a try-catch-block with error logging inside of my function invoked by the multiprocess pool, because otherwise any crash in there would result in a trace which looked like the pool was trying to loop over nonexisting data. That really was the main lesson learned. Plus, a reminder of the usefulness of switching between large and small datasets depending on what I am testing. When trying to trace what happens when parsing metadata, it becomes much easier to sort things through if you limit yourself to one typical file rather than log information for all 7000 of them …

I still have more metadata to put to use, the most important being creation and modification timestamps, but everything is nicely put into a dictionary for when I want it. With tags and titles properly extracted, I had enough to move on and get properly displaying results.

Pro tips

Destroying and re-creating 7000 files within seconds kind of scares Dropbox into fully occupying a CPU core.
Even things like grep can get shocked when asked to handle that many files. What you want to do is grep -r on the folder instead of grep sometext *.

Great side effects

All that done, I went ahead and created archive pages. The initial motivation was that I wanted to see how long they would take to generate on each rebuild, as they were the main thing Macpro had which Bakery could not support.

No, I do not intend to become the supplier of a static site generator for Macpro, but it seems to work wonderfully as a motivational MacGuffin.

What I have ended up doing is generating multiple archive pages. There is one with all posts, plus one for each tag I have used. They can be reached by clicking the archive link in the footer of each page, or by clicking a tag on the index page.

And once I had that full archive page, it felt a bit unnecessary to list all pages on the index page, so I re-did it to display the full content of the last five posts instead.

Kind of catching up to … oh, about 2005?

All the changes are up on Github, the single-process version is thoroughly deprecated and the README should probably get a serious update too.

More importantly: the site rebuilds in 0.8 seconds on my Macbook, I have had great fun and I still want to add more stuff.