My Book Picker (and Lister)

2022 Version

librarything

Introduction/Rationale

This blogpost describes the most recent version of my "book picking" system. It assumes a Linux operating system, and uses Perl. Some "extra" Perl modules are used, not in the core Perl distribution: Const::Fast, version, and HTML::Template. (My current distribution, Fedora, makes installing these modules pretty easy. Your mileage… etc.) Files are available at GitHub.

I've used this system for a number of years, and have tweaked it significantly over that period.

But one thing hasn't changed at all: it's another example of the mental aberration that causes me to write Perl scripts to solve life's little everyday irritants. In this case two little irritants:

  1. I noticed that I had a lot of books on my shelves, acquired long past, that I never got around to reading. Either because (a) they were dauntingly long and dense (I'm thinking about Infinite Jest by David Foster Wallace); or because (b) they just fell through the cracks. Both poor excuses, but there you are.

  2. I sometimes want to methodically read (or reread) a series of books in a particular order.

In other words, I needed a way to bring diligence and organization to my previous chaotic and sloppy reading habits.

I think of what I came up with as the "To-Be-Read" (hereafter TBR) database. That's a slightly lofty title, but anyway:

High-Level View

All the TBR books are in zero or more stacks, each stack containing zero or more titles. Each stack contains a list of books in maintained in the order I want to read them. (This goes back to the issue mentioned above: sometimes a series really "should" be read in publishing order, for example C.J. Box's novels featuring protagonist Joe Pickett.)

So picking a book to read involves (a) choosing an "eligible" stack; and (b) "popping" the top book from the chosen stack. Very computer science-y.

The interesting part is the "choosing an eligible stack" step. There are a number of possible ways to do it. But first, more details on "eligibility".

  • "Obviously" you can't pick a book off an empty stack. So a stack with no books in it is ineligible. (Why are there empty stacks? Because I might want to add one or more books to them someday. Like if Steve Hamilton ever writes another book.)
  • The stacks also contain books I don't have yet. I want to read them someday. But I'm waiting. Maybe a book has been announced but not released yet. (Example below.) Or I'm waiting for the price to come down, either via the Barnes & Noble remainder table or the Amazon used market. I'm RetiredOnAFixedIncome, after all. So: If the top book on a stack is unowned, there's no point in picking it. Hence, that stack is ineligible.
  • One final tweak: I found that I didn't want to read a book "too soon" after just reading a previous book in the stack. So each stack has an "age": the time that's elapsed since I previously picked a book from that stack. And a "minimum age", the amount of time that must elapse after a pick before that stack becomes eligible again.

Executive summary: an eligible stack is one that:

  • is non-empty;
  • the top book is owned;
  • the stack is older than its specified minimum age.
OK, so how do we choose among eligible stacks? Possibilities:
  1. Pick the "oldest" stack; the one for which it's been the longest time since a book from it was previously picked.
  2. Pick the highest stack, the one with the most titles therein. (Because it needs the most work, I guess.)
  3. Just pick a stack at random.
  4. Pick a random stack weighted by stack height. That is, any stack can be picked, but one with eight titles in it is twice as likely to be picked as one with four titles. (This was the algorithm used in the previous version.)
  5. Pick a random stack, weighted by age. That is, a stack that's 90 days old is twice as likely to be picked as a 45-day old one.
  6. But what I'm doing is a combination of the last two: the stack-weighting function is the stack height times the stack age. So (for example) a 120-day-old stack with 5 titles is twice as likely to be picked as a 50-day-old stack with 6 titles. Because 120 * 5 = 600 and 50 * 6 = 300. This is totally arbitrary, but it seems to work for me so far.

Now, on to the gory details.

The data file ~/var/bookstacks.pl

Previous versions of the system used CSV files to store all this data. I've switched over to a single file (~/var/bookstacks.pl), containing executable Perl code that is used to initialize an array of hashes named @STACKS. At a high level, it looks like:

@STACKS = (
     {
         hash elements for stack 0
     },
     {
         hash elements for stack 1
     },
     …
     {
         hash elements for stack N-1
     },
);

(It Is No Coincidence that this resembles output from the standard Perl Data::Dumper module. See below.)

Each @STACKS array element is a hash. Here's the (actual, as I type) entry for my Michael Connelly stack:

    …
    {
      'name' => 'Michael Connelly',
      'minage' => 30,
      'lastpicked' => '2021-08-21',
      'books' => [
		   {
		     'title' => 'The Dark Hours',
		     'author' => 'Michael Connelly',
		     'ASIN' => 'B08WLRG1L2',
		     'owned' => 1
		   },
		   {
		     'title' => 'Desert Star',
		     'author' => 'Michael Connelly',
		     'ASIN' => 'B09QKSLPN9',
		     'owned' => 0
		   }
		 ]
    },
    …

In words: this @STACKS element contains the stack's name ("Michael Connelly"); the stack's minimum age before becoming eligible (30 days); the date the stack was previously picked (August 21, 2021); and the books currently in the stack. (There are two, The Dark Hours, which I own on Kindle, and Desert Star, not out until November 2022, hence unowned.)

(Yes, that's a subarray of hashes inside the outer array of hashes. Why are you looking at me like that?)

(And no, I haven't memorized the rules about when/whether to use […], {…}, or (…). After decades of Perl coding, I still crack open the Perl Data Structures Cookbook or peruse my existing code where I see if I've done something similar that worked in the past.)

A complete file (my actual version as of April 2022) is here. No comments from the peanut gallery about my lack of literary taste, please.

I named it with a .pl extension because some editors will use that as a hint to do Perl syntax highlighting. It can be read into a script with Perl's do command. For example…

booklister

The booklister script (here) is the simplest script in the system. It reads the data file described above and displays its contents in a (slightly) more verbose and readable form. It also prints, for each eligible stack, its weight and pick-probability.

Sample booklister output is here.

booklister_html

The booklister_html script (here) is slightly more complicated. It uses an HTML::Template template to generate an HTML page of the book stacks. It uses text formatting to display stack eligibility/ineligibility, and whether a book is owned or not. Finally, it generates a nice pie chart to display each eligible stack's pick-probabilities, using Google Charts code. It saves the generated page in /tmp/booklist.html; example here.

bookpicker

The bookpicker script (here) simply ("simply") sucks in the bookstacks data, filters out the eligible stacks, then picks one of the eligible stacks at (weighted) random. It "pops" the book at the top of the stack (actually uses a Perl shift, because…). And finally, it writes the modified stack data back out to ~/var/bookstacks.pl, saving the previous version with a .old appended to its name.

(Perl's Data::Dumper module is used for that last part. Some tweaks are used to get it usable for initialization and to get the hash keys to print in a non-random order.)

If you'd like a little more detail on the picking process, you can add the -v (verbose) flag. Speaking of that, a small digression on…

Picking a random stack according to a weighting function

It's not hard. Just imagine throwing a dart at that pie chart mentioned above. Your probability of hitting any one segment is proportional to its area. So…

I'd pseudocode the algorithm like this:

Given: N eligible stacks (indexed 0..N-1), with Wi being the calculated weight of the ith list (assumed integer) …

Let T be the total weight, W0 + W1 + ⋯ + WN-1

Pick a random number r between 0 and T-1.

p = 0
while (r >= Wp)
     r -= Wp
     p++

… and on loop exit p will index the list picked.

To anticipate CS pedants: I know this is O(N) and using a binary search instead could make it O(log N). In practice, it's plenty fast enough. And other steps in the process are O(N) anyway.

Editing the stacks, an apology

But what if you want to add a new stack? Or add books to a stack? Or delete something? Or (just generally) change something?

I don't have any handy scripts for that. It's a hard problem. I, personally, just edit ~/var/bookstacks using My Favorite Editor (that would be vim).

I have some ideas for a user-friendlier script. Maybe someday. Now that I've said it, maybe someday soon.

Whew!

I feel better getting this off my chest..


Last Modified 2022-08-20 6:23 AM EDT