[Update 2019-11-11: Sources moved to GitHub; verbose flag added to picking script; HTML listing script includes stack weights and probabilities, and indicates whether book is owned on Kindle.]
This is an updated version of a "geekery" post from last year. I've made substantial changes to the script it describes since then. I'm leaving the former (and much simpler) version in place, but also wanted to show off my new version.
But one thing hasn't changed at all: it's another example of the mental aberration that causes me to write Perl scripts to solve life's little everyday irritants. In this case two little irritants:
- I noticed that I had a lot of books on my shelves, acquired long past, that I never got around to reading. Either because (a) they were dauntingly long and dense (I'm thinking about Infinite Jest by David Foster Wallace); or because (b) they just fell through the cracks. Both poor excuses, but there you are.
- I sometimes want to methodically read a series of books in a particular order.
In other words, I needed a way to bring diligence and organization to my previous chaotic and sloppy reading habits.
I think of what I came up with as the "To-Be-Read" (hereafter TBR) database. That's a slightly lofty title, but anyway:
The high-level view: all the TBR books are in zero or more stacks, each stack containing zero or more titles. Each stack is maintained in the order I want to read the books therein. (This goes back to the issue mentioned above: sometimes a series really "should" be read in publishing order, for example C.J. Box's novels featuring protagonist Joe Pickett.)
So picking a book to read involves (a) choosing an "eligible" stack; and (b) "popping" the top book from the chosen stack. Very computer science-y.
The interesting part is the "choosing an eligible stack" step. There are a number of possible ways to do it. But first, more details on "eligibility".
The major problem with the previous version of this script was that too often it would pick a book "too soon" after I'd read something off the same stack. (An issue mentioned in last year's post.) As it turns out, I wanted to let some time go by between picks from the same stack. (For example, at least 30 days between books by Heinlein. Too much of a good thing, too soon…)
So: in this version, each stack has an "age": the time that's elapsed since I previously picked a book from that stack. And a "minimum age", the amount of time that must elapse after a pick before that stack becomes eligible again.
Another minor difference: I don't actually own some of the books in some of the stacks yet. I want to read them someday. But I'm waiting, typically for the price to come down, either via the Barnes & Noble remainder table or the Amazon used market. I'm RetiredOnAFixedIncome, after all.
So an eligible stack is one that:
- is non-empty;
- the top book is owned;
- the stack is older than its specified minimum age.
- Pick the "oldest" stack; the one for which it's been the longest time since a book from it was previously picked.
- Pick the highest stack, the one with the most titles therein. (Because it needs the most work, I guess.)
- Just pick a stack at random.
- Pick a random stack weighted by stack height. That is, any stack can be picked, but one with eight titles in it is twice as likely to be picked as one with four titles. (This was the algorithm used in the previous version.)
- Pick a random stack, weighted by age. That is, a stack that's 90 days old is twice as likely to be picked as a 45-day old one.
- But what I'm doing is a combination of the last two: the stack-weighting function is the stack height times the stack age. So (for example) a 120-day-old stack with 5 titles is twice as likely to be picked as a 50-day-old stack with 6 titles. Because 120 * 5 = 600 and 50 * 6 = 300. This is totally arbitrary, but it seems to work for me so far.
Here's my current take on scripting that.
Each stack is implemented as a comma-separated values (CSV) file, headerless, one line per book, each line containing two fields:
- The book title;
- Whether I own the book yet (1/0 = yes/no).
moore.csv, containing the to-be-read books of Christopher Moore:
"The Serpent of Venice",1
I.e., three books, the first two owned, the third one, Noir, unpurchased as yet. (I'll get it someday, and edit the file to change the 0 to 1.)
[Added: in addition to 0/1, 'K' indicates that the book's Kindle version is owned. This is just a convenience in case I go looking for it long after actually buying it.]
There is a "master" CSV file,
stacks.csv. It has a header
(for some reason that I forget). Each non-header line contains data for
a single stack:
- The (nice human-readable) stack name;
- The stack ID (corresponding to the name of the stack file);
- The minimum time, in days, that should elapse between consecutive picks from that stack;
- The date when a book was most recently picked from the stack.
"Chronicles of Amber",amber,42,2018-04-15
"Conservative Lit 101",conservative_lit_101,60,2017-09-07
"Robert A. Heinlein",heinlein,30,2018-06-19
No comments from the peanut gallery about my lack of literary taste, please.
Picking a random stack according to a weighting function isn't hard. I'd pseudocode the algorithm like this:
Let T be the total weight, W0 + W1 + ⋯ + WN-1
Pick a random number r between 0 and T-1.
p = 0
while (r >= Wp)
r -= Wp
… and on loop exit p will index the list picked.
To anticipate CS pedants: I know this is O(N) and using a binary search instead could make it O(log N). In practice, it's plenty fast enough. And other steps in the process are O(N) anyway.
The "picking" script,
bookpicker, is here.
-v"verbose" flag will output a list of each stack's pick-probabilities.
Text::CSVPerl module is used for reading/writing CSV files. The
Time::Secondsmodules are invaluable for doing the simple age calculations and comparisons.
- You just run the script with no arguments or options; output is the
title and the name of the picked list.
- The user is responsible for maintaining the CSV files; no blank/duplicate
lines, etc. I use My Favorite Editor (vim), but CSVs are also editable
with Your Favorite Spreadsheet.
- For the "picked" stack, the script writes a smaller file with the
picked title missing. The old stack is saved with a
.oldappended to the name. The
stacks.csvfile is also updated appropriately with today's date for the last-picked field for the picked stack.
The weighting function and random number generation are constrained to
integer values; I think it would work without that, but who wants to
about rounding errors? Not I.
A script that produces plain text output (on stdout) is here.
A script that produces an HTML page and displays it in my browser
(Google Chrome) is
It uses text color to signify eligible/ineligible stacks and
Sample output (again, comments on my literary taste, or lack thereof, are
HTML::Templatemodule is used to make output generation easier, and the template used for that is here
Getting it to show up in my browser is accomplished via chromix-too server/client/extension; if you don't have it, it's pretty easy to do something else instead.
Whew! I feel better getting this off my chest..